Fastest Variable Set in Java - java

I need Structure (Arraylist, LinkedList, etc) that is very fast for this case:
While the structure is not empty I search the structure for elements that satisfy a condition , lets say k, remove the elements that satisfy k and start over for another condition lets say k+1.
e.g.:
for (int i = 1 ; i <= 1000000; i++) {
structure.add(i);
}
d = 2;
while (!structure.isEmpty()) {
for(int boom : structure.clone) {
if (boom % d == 2) {
structure.remove(boom);
}
d++;
}
}

If the elements are primitives, then the fastest structure will most probably be a specialized primitive collection (e.g., trove). Following references for boxed primitives is a nearly sure cache miss and this probably dominates the costs.
I wouldn't suggest a LinkedList for the same reason: It's dead slow due to cache misses.
If the order is unimportant, than an ArrayList is perfect. Instead of removing an element, replace it by the last one and remove the last array element. This is an O(1) operation and doesn't suffer from the bad spatial locality.
If the order is important, you can build your own ArrayList-like structure. Instead of removing an element, you mark it for removal e.g. in a BitSet or in a boolean[]. Finally you perform the removal in one sweep by moving all elements to their right position and adjusting the length. The optimized loop will most probably look similar to CharMatcher.removeFrom loop.
A simpler solution would be to use an ArrayList and copy all surviving elements to another one. I'd bet it'd beat the LinkedList hands down. As a minor GC-friedly optimization you can work with two lists.

LinkedList should be fastest for this case. Use the iterator explicitly (structure.iterator()) and call the remove method of the iterator instead of calling structure.remove(element)!

I don't know your exact use case, but here's one note.
If you have your predicates P1 .. PN pre-compiled, available, and if you are not modifying the contents of the collection and if your predicates are not dependent on each other, you might want to create a composite predicate, like bundle up N predicates in some logical order, and then in only one iteration over your collection perform the filtering method.
As for data structure, I'd think if it like this:
If my filtering predicates will be totally arbitrary, then a list should be OK to use.
In some more specific cases with very limited and strict value sets, you might consider a tree-like or a graph-like structure, where you could have some master nodes which would denote that property "property1" has value "value1". In case you wanted to drop all items where "property1" value is "value1" you could tell that master node to remove all his children (and that they should detach themselves from any other parent master nodes they might have).

Sorted List data structure
If you construct the lists yourself you can consider using a sorted data-structure. It will give you best search performance( log n complexity so it is very fast).
Linked List data structure
LinkedList gives you constant time element removal but random access doesn't have constant complexity (is slow).
You will have to benchmark if a LinkedList or a sorted list would be faster for your scenario.

If your elements are ints, I suppose bit set would be the fastest data structure for this task. Iteration would be slightly slower than through array list (even not standard java.util.ArrayList, only primitive specialization), but remove ops cost nearly nothing, while removes from any array list are quite expensive.
Note, you can gain much by working directly with long[] as bit set and performing bitwise operations by hand, because java.util.BitSet is not very performance-focused. But, of cause, start with BitSet.

Related

Why use ArrayList if requirement is not clear?

I have found following question on www.javatpoint.com
If you were to use a List implementation,but not sure which one to, because the requirement is not yet clear. In this case which List implementation will you use ?
options:
1. ArrayList
2. LinkedList
Correct answer for this is ArrayList
But there is no explanation why, Please help me to uderstand
Simple Term
ArrayList : Iterating Over An ArrayList is faster than Linked List, Because All Elements Stored in Contiguous Memory Location. But Performing Operation Like Delete Will Reduce Performance Because Again Entire List Order Changes (Like if you delete element at position 3rd then all next elements location are currentLocation - 1).
LinkedList : Slower When Iteration Performed(As Compare To ArrayList). But Delete and Update Operation Becomes Faster Because If you delete any element at any position only previous and after element locations are changed(Not Entire List).
So When You Don't Have Clear Requirements Just Iteration Is Basic Need (And Array List Gives Best Performance).
One possible reason might be that elements in ArrayList consume less memory space than in LinkedList, because each element in LinkedList contains a value plus a pointer to the next element, while an element in ArrayList has only value
ArrayList stores elements in array in insert order. You can get value by its index in array.
LinkedList stores your elements in Node objects that references each other. Each Node references previous and next Node. You can get elements sequentially starting from both sides. You can also get elements by index but it's not as fast as in ArrayList. Use LinkedList when you need sequential access, like queue or stack data structure
From what I've read in the past: An ArrayList essentially acts as a better array, allowing for dynamic resizing. A LinkedList is a double linked list which causes it to have better performance on adding and removing items from the list, but getting and setting at arbitrary positions is slower than an ArrayList. So, the question is most likely assuming that, generally, a program will be getting and setting values more than it is adding and removing values, which is a fairly reasonable assumption, but the question in question is incorrect.

Structure like Java's EnumSet that can hold repeated elements

I need some structure where to store N Enums, some of them repeated. And be able to easily extract them. So far I've try to use the EnumSet like this.
cards = EnumSet.of(
BEST_OF_THREE,
BEST_OF_THREE,
SIMPLE_QUESTION,
SIMPLE_QUESTION,
STAR);
But now I see it can only have one of each. Conceptually, which one would be the best structure to use for this problem.
Regards
jose
You can use a Map of type Enumeration -> Integer, where the integer indicates how many of each there are. The google guava "MultiSet" does this for you, and handles the edge cases of adding an enum to the set when there is not already an entry, and removing an enum when it leaves none left.
Another strategy is to use the Enumeration ordinal index. Because this index is unique, you can use this to index into an int array that is sized to the Enumeration size, where the count in each array slot would indicate how many of each enumeration you have. Like this:
// initialize array for counting each enumeration type
// TODO: someone should double check every initial value will be zero
int[] cardCount = new int[CardEnum.values().length];
...
// incrementing the count for an enumeration (when we add)
cardCount[BEST_OF_THREE.ordinal()]++;
...
// decrementing the count for an enumeration (when we remove)
cardCount[BEST_OF_THREE.ordinal()]--;
// DEBUG: assert cardCount[BEST_OF_THREE.ordinal()] >= 0
...
// getting the count for an enumeration
int count = cardCount[BEST_OF_THREE.ordinal()];
... Some time later
Having read the clarifying comments underneath the original post that explained what the OP was asking, it is clear that you're best off with a linear structure with an entry per element. I didn't realize that you didn't need detailed information on how many of each you needed. Storing them in a MultiSet or an equivalent counting structure makes it hard to randomly pick, as you need to attribute an index picked at random from [0, size) to a particular container, which takes log time.
Sets don't allow duplicates, so if you want repeats you'll need either a List or a Map.
If you just need the number of duplicates, an EnumMap with Integer values is probably your best bet.
If the order is important, and you need quick access to the number of each type, you'll probably need to roll your own data structure.
If the order is important (but the count of each is not), then a List is the way to go, which implementation depends on how you will use it.
LinkedList - Best when there will be many inserts/removals from the beginning of the List. Indexing into a LinkedList is very expensive, and should be avoided whenever possible. If a List is built by shifting data onto the front of the list, but any later additions are at the end, conversion to an ArrayList once the initial List is built is a good idea - especially if indexing into the List is anticipated at any point.
ArrayList - When in doubt, this is a good place to start. Inserting or removing items requires shifting, so if this is a common operation look elsewhere.
TreeList - This is a good all-around option, and insertions and removals anywhere in the List are inexpensive. This does require the Apache commons library, and uses a bit more memory than the others.
Benchmarks, and the code used go generate them can be found in this gist.

Indexed addition without IndexOutOfBounds Exception

I have n objects each of them with an identifying number. I get them unsorted but the range of indexes (0, n-1) is used to identify them. I want to access them as fastest as possible. I suppose that an ArrayList would be the best option, I'd add the object with identifier n at the position of the ArrayList with index n by:
list.add(identifier, object);
The problem is that when I am adding the objects I get an IndexOutOfBounds Exception because I'm adding them unsorted and the size() is smaller although I know that previous positions will also be filled.
Another option is to use a HashMap but I suppose that this will decrease performance.
Do you know a collection that has the behavior described above?
Do you know a collection that has the behavior described above?
It sounds like you need a plain old Java array. And if you need it as a collection, then use "Arrays.asList(...)" to create a List wrapper for it.
Now this won't work if you needed to add or remove elements from the array / collection, but it sounds like you don't need to from your problem description.
If you do need to add / remove elements (as distinct from using set to update the element at a given position, then Peter Lawrey's approach is best.
By contrast, a HashMap<Integer, Object> would be an expensive alternative. At a rough estimate, I'd say it that "indexing" operations would be at least 10 times slower, and the data structure would take 10 times the space compared to an equivalent array or ArrayList type. A hash table based solution is only really a viable alternative (from a performance perspective) if the array is large and sparse.
Sometimes you get the indexes out of order. This requires you to add dummy entries which may be filled later.
int indexToAdd = ...
E elementToAdd = ...
while(list.size() <= indexToAdd) list.add(null);
list.set(indexToAdd, elementToAdd);
This will allow you to add entries beyond the current end of the list.
The Javadoc for List.add(int, E) and List.set(int, E) both state
IndexOutOfBoundsException - if the index is out of range (index < 0 || index > size())
If you attempt to add entries beyond the end
List list = new ArrayList();
list.add(1, 1);
you get
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 0
at java.util.ArrayList.rangeCheckForAdd(ArrayList.java:612)
at java.util.ArrayList.add(ArrayList.java:426)
at Main.main(Main.java:28)
I'm not sure how much more expensive a HashMap<Integer, T> would be. Integer.hashCode() is quite efficient, though the only really expensive operation might be copying the data into a new larger array as the number of items increases. However, if you know your n, you could use a normal array.
As an alternative, you could implement your own Map<Integer, T> that does not use the hash code but the Integer itself. Before you do this, make sure that neither an array is sufficient nor HashMap is efficient enough!
I think you have at least two good options.
The first is to use a straight Java array initialized to the appropriate length, then add objects with the syntax:
theArray[i] = object;
The second would be to use a HashMap, and add objects with the syntax:
theMap.put(i, object);
I'm not sure what performance issues you're worried about, but adding elements within the range, clearing (out of an array) or removing (out of a HashMap), and finding elements from a given index (or key, for a HashMap) are all O(1) for both structures. I would also suggest taking a look at Wikipedia's list of data structures if neither of these seem good.

Which list implementation is optimal for removing and inserting from the front and back?

I am working on an algorithm that will store a small list of objects as a sublist of a larger set of objects. the objects are inherently ordered, so an ordered list is required.
The most common operations performed will be, in order of frequency:
retrieving the nth element from the list (for some arbitrary n)
inserting a single to the beginning or end of the list
removing the first or last n elements from the list (for some
arbitrary n)
removing and inserting from the middle will never be done so there is no need to consider the efficiency of that.
My question is what implementation of List is most efficient for this use case in Java (i.e. LinkedList, ArrayList, Vector, etc)? Please defend your answer by explaining the implementation s of the different data structures so that I can make an informed decision.
Thanks.
NOTE
No, this is not a homework question. No, I do not have an army research assistants who can do the work for me.
Based on your first criteria (arbitrary access) you should use an ArrayList. ArrayLists (and arrays in general) provide lookup/retrieval in constant time. In contrast, it takes linear time to look up items in a LinkedList.
For ArrayLists, insertion or deletion at the end is free. It may also be with LinkedLists, but that would be an implementation-specific optimization (it's linear otherwise).
For ArrayLists, insertion or deletion at front requires linear time (with consistent reuse of space, these may become constant depending on implementation). LinkedList operations at front of list are constant.
The last two usage cases somewhat balance each other out, however your most common case definitely suggests array-based storage.
As far as basic implementation details:
ArrayLists are basically just sequential sections of memory. If you know where the beginning is, you can just do a single addition to find the location of any element. Operations at the front are expensive because elements may have to be shifted to make room.
LinkedLists are disjoint in memory and consist of nodes linked to each other (with a reference to the first node). To find the nth node, you have to start at the first node and follow links until you reach the desired node. Operations at the front just require creating a node and updating your start pointer.
I vote for double linked list. http://docs.oracle.com/javase/6/docs/api/java/util/Deque.html
Probably the best data structure for this purpose would be a deque implemented with a dynamic array, which is basically an ArrayList that starts adding elements to the middle of the internal array instead of the beginning. Unfortunately Java's ArrayDeque does not support looking up an nth element.
It is, however, pretty easy to implement one yourself (or lookup an existing implementation), and then all three of the described operations can be done in O(1).
YOu can do all of them with arrayList with minimal confusion if your not worried about efficiency.
i would uses some sort of a queue or stack if i am only inserting at the front or end. They have the least overhead. Or you could also use a linked list.
To remove N elements from the first or end i would use a linked list, you can just delete one node and the ones before or after it are gone. Ie if i delete the first 5 elements just delete the 5th element and the ones before it will disappear. Also if i delete the last 6 elements just delete the 6th to last one and the rest will disappear. And java will do the garbage collecting for you. This would be an order of (1) for this operation.
is this a homework question?
Definitely go for LinkedList. For both inserting a value at the beginning/end of the list and removing the first/last element in the list, it runs in O(1). This is because all that needs to be changed to carry out these operations is a couple of pointers, a minimally costly operation.
Although ArrayLists retrieve the nth element in O(1) while LinkedLists retrieve the nth element in O(n), ArrayLists run the danger of having to adjust their size when elements are inserted. What do you suppose happens when the memory allotted for the ArrayList is used up and you try to insert another element? Well what happens is the ArrayList duplicates itself then allocates more memory (amounting to twice as much as it had initially allocated), a very costly operation. LinkedLists don't have this problem since, again, all that is done is the addition of a pointer.
I don't know a whole lot about Java Vectors, but if they're anything like C++ vectors, then they're very similar to ArrayLists.
I hope this helps.
java.util.TreeMap of Long to Object, and use index of i+tm.firstKey()

Question regarding Java's LinkedList class

I have a question regarding the LinkedList class in Java.
I have a scenario wherein i need to add or set an index based on whether the index exists in the linkedlist or not. A pseudo-code of what i want to achieve is --
if index a exists within the linkedlist ll
ll.set(a,"arbit")
else
ll.add(a,"arbit")
I did go through the Javadocs for the LinkedList class but did not come across anything relevant.
Any ideas ?
Thanks
p1ng
What about using a Map for this:
Map<Integer, String> map = new HashMap<Integer, String>();
// ...
int a = 5;
map.put(a, "arbit");
Even if a already exists, put will just replace the old String.
Searching in linked list is not very efficient (O(n)). Have you considering using different data structure - e.g. HashMap which would give you O(1) access time?
If you need sequential access as well as keyed access you might want to try a LinkedHashMap, available as from 1.4.2
http://download.oracle.com/javase/1.4.2/docs/api/java/util/LinkedHashMap.html
Map<Integer, String> is definitely a good (the best?) way to go here.
Here's an option for keeping with LinkedList if that's for some bizarre reason a requirement. It has horrible runtime performance and disallows null, since null now becomes an indicator that an index isn't occupied.
String toInsert = "arbit";
int a = 5;
//grow the list to allow index a
while ( a >= ll.size() ) {
ll.add(null);
}
//set index a to the new value
ll.set(a, toInsert);
If you're going to take this gross road, you might be better off with an ArrayList.
Why is it so bad? Say you had only one element at index 100,000. This implementation would require 100,000 entries in the list pointing to null. This results in horrible runtime performance and memory usage.
LinkedList cannot have holes inside, so you can't have list [1,2,3,4] and then ll.add(10,10), so I think there's something wrong with your example. Use either Map or search for some other sparse array
It looks like you're trying to use a as a key, and don't state whether you have items at index i < a. If you run your code when ll.size() <= a then you'll end up with a NullPointerException.
And if you add an item at index a the previous item at a will now be at a+1.
In this case it would be best to remove item at a first (if it exists) then add item "arbit" into a. Of course, the condition above re: ll.size() <=a still applies here.
If the order of the results is important, a different approach could use a HashMap<Integer,String> to create your dataset, then extract the keys using HashMap<?,?>.getKeySet() then sort them in their natural order (they're numeric after all) then extract the values from the map while iterating over the keySet. Nasty, but does what you want... Or create your own OrderedMap class, that does the same...
Could you expand on why you need to use a LinkedList? Is ordering of the results important?

Categories