Java LinkedHashSet remove some elements from the end

Java LinkedHashSet remove some elements from the end - java

I am working on a problem where i'm required to store elements with requirements of No Duplication and Maintaining order. I chose to go with LinkedHashSet Since it fulfilled both my requirements.
Let's say I have this code:
LinkedHashSet hs = new LinkedHashSet();
hs.add("B");
hs.add("A");
hs.add("D");
hs.add("E");
hs.add("C");
hs.add("F");
if(hs.contains("D")){
//do something to remove elements added after"D" i-e remove "E", "C" and "F"
//maybe hs.removeAll(Collection<?>c) ??
}
Can anyone please guide me with the logic to remove these elements?
Am I using the wrong datastructure? If so, then what would be a better alternative?

I think you may need to use an iterator to do the removal if you are using a LinkedHashSet. That is to say find the element, then keep removing until you get to the tail. This will be O(n), but even if you wrote your own LinkedHashSet (with a doubly linked list and hashset) you would have access to the raw linking structure so that you could cut the linked list in O(1), but you would still need to remove all elements that you just cut from the linked list from the HashSet which is where the O(n) cost would arise again.
So in summary, remove the element, then keep an iterator to that element and continue to walk down removing elements until you get to the end. I'm not sure if LinkedHashSet exposes the required calls, but you can probably figure that out.

You could write your own version of an ArrayList that doesn't allow for duplicates, by overriding add() and addAll(). To my knowledge, there is no "common" 3rd party version of such, which has always surprised me. Anybody know of one?
Then the remove code is pretty simple (no need to use an ListIterator)
int idx = this.indexOf("D");
if (idx >= 0) {
for (int goInReverse = this.size()-1; goInReverse > idx; goInReverse--)
this.remove(goInReverse);
}
However, this is still O(N), cause you loop through every element of the List.

The basic problem here is that you have to maintain two data structures, a "map" one representing the key / value mapping, and a "list" other representing the insertion order.
There are "map" and "list" organizations that offer fast removal of a elements after a given point; e.g. ordered trees of various kinds and both array and chain-based lists (modulo the cost of locating the point.)
However, it seems impossible to remove N elements from the two data structures in better than O(N). You have to visit all of the elements being removed to remove them from the 2nd data structure. (In fact, I suspect one could prove this mathematically ...)
In short, there is no data structure that has better complexity than what you are currently using.
The area where it is possible to improve performance (with a custom collection class!) is in avoiding an explicit use of an iterator. Using an iterator and the standard iterator API, the cost is O(N) on the total number of elements in the data structure. You could make this O(N) on the number of elements removed ... if the hash entry nodes also had next/prev links for the sequence.

So, after trying a couple of things mentioned above, I chose to implement a different Data structure. Since I did not have any issue with the O(n) for this problem (as my data is very small)
I used Graphs, this library came in really handy: http://jgrapht.org/
What I am doing is adding all elements as vertices to a DirectedGraph also creating edges between them (edges helped me solve another non-related problem as well). And when it's time to remove the elements I use a recursive function with the following pseudo code:
removeElements(element) {
tempEdge = graph.getOutgoingEdgeFrom(element)
if(tempEdge !=null)
return;
tempVertex = graph.getTargetVertex(tempEdge)
removeElements(tempVertex)
graph.remove(tempVertex)
}
I agree that graph DS is not good for these kind of problems, but under my conditions, this works perfectly... Cheers!

Related

Java: What collection type should I use for this case?

What I need:
Fastest put/remove, this is used alot.
Iteration, also used frequently.
Holds an object, e.g. Player. remove should be o(1) so maybe hashmap?
No duplicate keys
direct get() is never used, mainly iterating to retrieve data.`
I don't worry about memory, I just want the fastest speed possible even if it's at the cost of memory.

For iteration, nothing is faster than a plain old array. Entries are saved sequentially in memory, so the JVM can get to the next entry simply by adding the length of one entry to the its address.
Arrays are typically a bit of a hassle to deal with compared to maps or lists (e.g: no dictionary-style lookups, fixed length). However, in your case I think it makes sense to go with a one or two dimensional array since the length of the array will not change and dictionary-style lookups are not needed.

So if I understand you correctly you want to have a two-dimensional grid that holds information of which, if any, player is in specific tiles? To me it doesn't sound like you should be removing, or adding things to the grid. I would simply use a two-dimensional array that holds type Player or something similar. Then if no player is in a tile you can set that position to null, or some static value like Player.none() or Tile.empty() or however you'd want to implement it. Either way, a simple two-dimensional array should work fine. :)

The best Collection for your case is a LinkedList. Linked lists will allow for fast iteration, and fast removal and addition at any place in the linked list. For example, if you use an ArrayList, and you can to insert something at index i, then you have to move all the elements from i to the end one entry to the right. The same would happen if you want to remove. In a linked list you can add and remove in constant time.
Since you need two dimensions, you can use linked lists inside of linked lists:
List<List<Tile> players = new LinkedList<List<Tile>>(20);
for (int i = 0; i < 20; ++i){
List<Tile> tiles = new LinkedList<Tile>(20);
for (int j = 0; j < 20; ++j){
tiles.add(new Tile());
}
players.add(tiles);
}

use a map of sets guarantee O(1) for vertices lookup and amortized O(1) complexity edge insertion and deletions.
HashMap<VertexT, HashSet<EdgeT>> incidenceMap;

There is no simple one-size-fits-all solution to this.
For example, if you only want to append, iterate and use Iterator.remove(), there are two obvious options: ArrayList and LinkedList
ArrayList uses less memory, but Iterator.remove() is O(N)
LinkedList uses more memory, but Iterator.remove() is O(1)
If you also want to do fast lookup; (e.g. Collection.contains tests), or removal using Collection.remove, then HashSet is going to be better ... if the collections are likely to be large. A HashSet won't allow you to put an object into the collection multiple times, but that could be an advantage. It also uses more memory than either ArrayList or LinkedList.
If you were more specific on the properties required, and what you are optimizing for (speed, memory use, both?) then we could give you better advice.

The requirement of not allowing duplicates is effectively adding a requirement for efficient get().
Your options are either hash-based, or O(Log(N)). Most likely, hashcode will be faster, unless for whatever reason, calling hashCode() + equals() once is much slower than calling compareTo() Log(N) times. This could be, for instance, if you're dealing with very long strings. Log(N) is not very much, by the way: Log(1,000,000,000) ~= 30.
If you want to use a hash-based data structure, then HashSet is your friend. Make sure that Player has a good fast implementation of hashCode(). If you know the number of entries ahead of time, specify the HashSet size. ( ceil(N/load_factor)+1. The default load factor is 0.75).
If you want to use a sort-based structure, implement an efficient Player.compareTo(). Your choices are TreeSet, or Skip List. They're pretty comparable in terms of characteristics. TreeSet is nice in that it's available out of the box in the JDK, whereas only a concurrent SkipList is available. Both need to be rebalanced as you add data, which may take time, and I don't know how to predict which will be better.

List vs. Map: Which takes less space and more efficient?

I have two classes Foo and Bar.
class Foo
{
Set<Integer> bars; // Foo objects have collection of bars.
Set<Integer> adjacents; // Adjacency list of Foos.
}
class Bar
{
int foo; // ID of foo of which this object belongs to
Ipsum ipsum; // This an arbitrary class. But it must be present
Map<Integer, Float> adjacents; // Adjacency list of Bars
}
Number of Bars are predefined (up to 1000). Hence, I may use an array.
But number of Foos are undefined (at most #ofBars/4).
When you consider addition, deletion and get(), I need the one which is faster and takes less space (because I'm going to use serialization).
Here are my options (as far as I have thought)
Option 1: Don't define a class for Foo. Instead, use List<Set<Integer>> foo; and another map for Map> fooAdjacencies;
Option 2: Use Map<Integer, Set<Integer> foo if I want to get bars of i, I simply write foo.get(i).
Option 3: Dont define classes. Instead, use option 2 and for Bar class:
Map<Integer, Ipsum> bar;
Map<Integer, Map<Integer, Floar>> barAdjacencies;
Which option should I choose in terms of space and time efficiency?

This sounds like it'd be very helpful for you (specifically the Data Structures section): http://bigocheatsheet.com/
You say
I need my structure to be efficient while adding, removing and finding elements. No other behavior.
The problem is that Lists and Maps are usually used in totally different cases. Their names describe their use cases fairly well -- you use a List if you need to list something (probably in some sequential order), while a Map would be used if you need to map an input to an output. You can use a Map as a List by mapping Integers to your elements, but that's overcomplicating things a bit. However, even within List and Map you can have different implementations that differ wildly in asymptotic performance.
With few exceptions, data structures will take O(n) space, which makes sense. If memory serves, anything other than an ArrayList (or other collections backed only by a primitive array) will have a decent amount of space overhead as they use other objects (e.g. Nodes for LinkedLists and Entry objects for Maps) to organize the underlying structure. I wouldn't worry too much about this overhead though unless space really is at a premium.
For best-performance addition, deletion, and search, you want to look at how the data structure is implemented.
LinkedList-style implementation will net you O(1) addition and deletion (and with a good constant factor, too!), but will have a pretty expensive get() with O(n) time, because the list will have to be traversed every time you want to get something. Java's LinkedList implementation, though, removes in O(n) time; while the actual act of deletion is O(1), that's only if you have a reference to the actual node that you're removing. Because you don't, removals in Java's LinkedList are O(n) -- O(n) for searching for the node to remove, and O(1) for removal.
Data structures backed with a plain array will have O(1) get() because it's an array, but takes O(n) to add, and delete, because any addition/deletion other than at the last element requires all other elements to be shuffled (in Java's implementation at least). Searching for something using an object instead of an index is done in O(n) time because you have to iterate over the array to find the object.
The following two structures are usually Maps, and so usually require you to implement equals() (and hashCode() for HashMaps):
Data structures backed by a tree (e.g. TreeMap) will have amortized (I think) O(lg n) add/remove, as a good implementation should be self-balancing, making worst-case addition/deletions only have to go through the height of the tree at most. get() operations are O(lg n). Using a tree requires that your elements be sortable/comparable in some way, which could be a bonus or hinderance, depending on your usage.
Hash-based data structures have amortized (average) O(1) everything, albeit with a slightly higher constant factor due to the overhead of hashing (and following any chains if the hash spread is poor). HashMaps could start sucking if you write a bad hashCode() function, though, so you want to be careful with that, although the implementers of Java's HashMap did do some magic behind the scenes to try to at least partially negate the effect of bad hashCode() implementations.
Hope that rundown helped. If you clear up how your program is structured, I might be able to give a recommendation. Until then, the best I can do is show you the options and let you pick.

I find this problem description a little hard to follow, but I think you're just looking for general collections/data structures advice.
A list (say, an array list) easily allows you to add and iterate over elements. When it is expanded beyond the size of the underlying array, a one-off costly resize operation is executed to add more space; but that is fine because it happens rarely and the amortized time is not bad. Searching for a specific element in a list is slow because you need to traverse it in order; there is no implied ordering in most lists. Deleting elements depends on the underlying list implementation. An array list could be slow in this regard; but I'm guessing that they optimized it just by marking the underlying element as deleted and skipping it during iteration. When using lists you also have to consider where you are adding elements. Linked lists are slower to iterate but can easily add and remove elements at any position. Array lists cannot easily add an element anywhere but the end.
Per your requirements, if you are required to execute a "get" or find on an element, then you need some kind of searching functionality to speed it up. This would make a map better as you can locate elements in log(n) time instead of linear time as when searching an unordered list. Adding and removing elements in a list is also relatively fast, so that's probably your best option.
Most importantly, implement it more than one way and profile it yourself to learn more :) Lists are rarely a good choice when searching is required though.

LinkedList<E> vs ArrayList<E> cost

I've read quite a few questions here that discuss the cost of using ArrayLists vs LinkedLists in Java. One of the most useful I've seen thus far is is here: When to use LinkedList over ArrayList?.
I want to be sure that I'm correctly understanding.
In my current use case, I have multiple situations where I have objects stored in a List structure. The number of objects in the list changes for each run, and random access to objects in the list is never required. Based on this information, I have elected to use LinkedLists with ListIterators to traverse the entire content of the list.
For example, my code may look something like this:
for (Object thisObject : theLinkedList) {
// do something
}
If this is a bad choice, please help me understand why.
My current understanding is that traversing the entire list of objects in a LinkedList would incur O(n) cost using the iterative solution. Since there is no random access to the list (i.e. The need to get item #3, for example), my current understanding is that this would be basically the same as looping over the content of an ArrayList and requesting each element with an index.
Assuming I knew the number of objects to be stored in the list beforehand, my current line of thinking is that it would be better to initialize an ArrayList to the appropriate size and switch to that structure entirely without using a ListIterator. Is this logic sound?
As always, I greatly appreciate everone's input!

Iteration over a LinkedList and ArrayList should take roughly the same amount of time to complete, since in each case the cost of stepping from one element to the next is a constant. The ArrayList might be a bit better due to locality of reference, though, so it might be worth profiling to see what happens.
If you are guaranteed that there will always be a fixed number of elements, and there won't be insertions and deletions in random locations, then a raw array might be a good choice, since it's extremely fast and well-optimized for this case.
That said, your analysis of why to use LinkedList seems sound. Again, it doesn't hurt to profile the program and see if ArrayList would actually be faster for your use case.
Hope this helps!

Vector option in Java

I am using vector of object. My issue is the removal from vector is expensive operation( O(n^2)). What would be the replacement of vector in Java. In my uses addition and removal is extensively happens.
i am C++ person don't know much Java

Well, Vector class shouldn't be used. There are so many containers available in Java. Few of them:
ArrayList is good for random access, but is bad for inserting or removing from the middle of the list.
LinkedList is bad for random access, but is fair good for iterating and adding/removing elements from the middle of container.

You can use ArrayList instead of vector in Java.

Check out this article:
http://www.javaworld.com/javaworld/javaqa/2001-06/03-qa-0622-vector.html
LinkedList can add/remove items at O(1)

First of all, Vector removal time complexity is O(n) not O(n^2). If you want more performant class, you should choose LinkedList. Its time complexity is constant.

Maybe a list is not the ideal data structure for your use case - would you be better off using a HashSet if the ordering of elements is not imporant?

Actually, the difference between Vector and ArrayList is that Vector is synchronized whereas ArrayList is not. Generally, you don't need synchronization and thus you'd use ArrayList (much like StringBuffer <-> StringBuilder).
The replacement mostly depends on how you intend to use the collection.
Adding objects to an ArrayList is quite fast, since if more space is required, it is normally doubled, and if you know the size requirements in advance, even better.
Removing from an ArrayList is O(n) but iteration and random access are fast.
If you have frequent add or remove operations and otherwise iterate over the list, a LinkedList would be fine.
You could even consider using a LinkedHashMap which allows fast access as well as preserves the order of insertion.

i think, Vector using System.arrayCopy which complexity is O(n^2)
It is correct that Vector will use System.arrayCopy to move the elements. However the System.arrayCopy() call copies at most Vector.size() elements, and hence is O(N) where N is the vector's size.
Hence O(N^2) is incorrect for a single insertion / removal.
In fact, if you want better than O(N) insertion and deletion, you will need to use some kind of linked list type with a cursor abstraction that allows insertion and deletion at "the current position". Even then you only get better than O(N) if you can do the insertions / deletions in the right order; i.e. not random order.
FWIW, the Java List APIs don't provide such a cursor mechanism ... not least because it would be awkward to use, and only efficient in certain circumstances / implementations.

Thanks to everyone for there contribution which helped me to solve this problem. I used a circular queue which has been written with help of vector.

Best way to remove and add elements from the java List

I have 100,000 objects in the list .I want to remove few elements from the list based on condition.Can anyone tell me what is the best approach to achieve interms of memory and performance.
Same question for adding objects also based on condition.
Thanks in Advance
Raju

Your container is not just a List. List is an interface that can be implemented by, for example ArrayList and LinkedList. The performance will depend on which of these underlying classes is actually instantiated for the object you are polymorphically referring to as List.
ArrayList can access elements in the middle of the list quickly, but if you delete one of them you need to shift a whole bunch of elements. LinkedList is the opposite i nthis respect., requiring iteration for the access but deletion is just a matter of reassigning pointers.
Your performance depends on the implementation of List, and the best choice of implementation depends on how you will be using the List and which operations are most frequent.

If you're going to be iterating a list and applying tests to each element, then a LinkedList will be most efficient in terms of CPU time, because you don't have to shift any elements in the list. It will, however consume more memory than an ArrayList, because each list element is actually held in an entry.
However, it might not matter. 100,000 is a small number, and if you aren't removing a lot of elements the cost to shift an ArrayList will be low. And if you are removing a lot of elements, it's probably better to restructure as a copy-with filter.
However, the only real way to know is to write the code and benchmark it.

Collections2.filter (from Guava) produces a filtered collection based on a predicate.
List<Number> myNumbers = Arrays.asList(Integer.valueOf(1), Double.valueOf(1e6));
Collection<Number> bigNumbers = Collections2.filter(
myNumbers,
new Predicate<Number>() {
public boolean apply(Number n) {
return n.doubleValue() >= 100d;
}
});
Note, that some operations like size() are not efficient with this scheme. If you tend to follow Josh Bloch's advice and prefer isEmpty() and iterators to unnecessary size() checks, then this shouldn't bite you in practice.

LinkedList could be a good choice.
LinkedList does "remove and add elements" more effective than ArrayList. and no need to call such method as ArrayList.trimToSize() to remove useless memory. But LinkedList is a dual-linked list, each element is wrapped as an Entry which needs extra memory.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.