TreeSet vs ArrayList and sort [duplicate]

TreeSet vs ArrayList and sort [duplicate] - java

I have implemented a graph.
I want to sort a given subset of vertices with respect to their degrees.
Therefore, I've written a custom comparator named DegreeComparator.
private class DegreeComparator implements Comparator<Integer>
{
#Override
public int compare(Integer arg0, Integer arg1)
{
if(adj[arg1].size() == adj[arg0].size()) return arg1 - arg0;
else return adj[arg1].size() - adj[arg0].size());
}
}
So, which one of the below is more efficient?
Using TreeSet
public Collection<Integer> sort(Collection<Integer> unsorted)
{
Set<Integer> sorted = new TreeSet<Integer>(new DegreeComparator());
sorted.addAll(unsorted);
return sorted;
}
Using ArrayList
Collections.sort(unsorted, new DegreeComparator());
Notice that the second approach is not a function, but a one-line code.
Intuitively, I'd rather choose the second one. But I'm not sure if it is more efficient.

Java API contains numerous Collection and Map implementations so it might be confusing to figure out which one to use. Here is a quick flowchart that might help with choosing from the most common implementations

A TreeSet is a Set. It removes duplicates (elements with the same degree). So both aren't equivalent.
Anyway, if what you want naturally is a sorted list, then sort the list. This will work whether the collection has duplicates or not, and even if it has the same complexity (O(n*log(n)) as populating a TreeSet, it is probably faster (because it just has to move elements in an array, instead of having to create lots of tree nodes).

If you only sort once, then the ArrayList is an obvious winner. The TreeSet is better if you add or remove items often as sorting a list again and again would be slow.
Note also that all tree structures need more memory and memory access indirection which makes them slower.
If case of medium sized lists, which change rather frequently by a single element, the fastest solution might be using ArrayList and inserting into the proper position (obviously assuming the arrays get sorted initially).
You'd need to determine the insert position via Arrays.binarySearch and insert or remove. Actually, I would't do it, unless the performance were really critical and a benchmark would show it helps. It gets slow when the list get really big and the gain is limited as Java uses TimSort, which is optimized for such a case.
As pointed in a comment, assuring that the Comparator returns different values is sometimes non-trivial. Fortunately, there's Guava's Ordering#arbitrary, which solves the problem if you don't need to be compatible with equals. In case you do, a similar method can be written (I'm sure I could find it somewhere if requested).

Related

Is there a better DS than a HashMap for storing a list of items if I frequently use the contains method on it?

I have a list of numbers. In my program I would frequently be checking if a certain number is part of my list. If it is not part of my list, I add it to the list, otherwise I do nothing. I have found myself using a hashmap to store the items instead of an arraylist.
void add(Map<Integer, Integer> mp, int item){
if(!mp.containsKey(item)){
mp.put(item, 1);
}
}
As you can see above, I put anything as the value, since I would not be using the values.
I have tested this process to be a lot faster than using an arraylist. (Also, containsKey() for hashmap is O(1) while contains() for arraylist is O(n))
Although it works well for me, it feels awkward for the simple reason that it is not the right data structure. Is this a good practice? Is there a better DS that I can use? Is there not any list that utilizes hashing to store values?

I have a list of numbers. In my program I would frequently be checking if a certain number is part of my list. If it is not part of my list, I add it to the list, otherwise I do nothing.
You are describing a set. From the Javadoc, a java.util.Set is:
A collection that contains no duplicate elements.
Further, the operation you are describing is add():
Adds the specified element to this set if it is not already present.
In code, you would create a new Set (this example uses a HashSet):
Set<Integer> numbers = new HashSet<>();
Then any time you encounter a number you wish to keep track of, just call add(). If x is already present, the set will remain unchanged and will not throw any errors – you don't need to be careful about adding things, just add anything you see, and the set sort of "filters" out the duplicates for you.
numbers.add(x);
It's beyond your original question, but there are various things you can do with the data once you've populated a set - check if other numbers are present/absent, iterate the numbers in the set, etc. The Javadoc shows which functionality is available to use.

An alternative solution from the standard library is a java.util.BitSet. This is - in my opinion - only an option if the values of item are not too big, and if they are relatively close together. If your values are not near each other (and not starting near to zero), then it might be worthwhile looking for third party solutions that offers sparse bit sets or other sparse data structures.
You can use a bit set like:
BitSet bits = new BitSet();
void add(int item) {
bits.set(item);
}
And as suggested in the comments by Eritrean, you can also use a Set (e.g. HashSet). Internally, a HashSet uses a HashMap, so it will perform similar to your current solution, but it does away with having to put sentinel values in yourself (you just add or remove the item itself).
As an added benefit, if you use Collection<Integer> as the type of parameters/fields in your code, you can easily switch between using an ArrayList or an HashSet and test it without having to change code all over the place.

Java: What collection type should I use for this case?

What I need:
Fastest put/remove, this is used alot.
Iteration, also used frequently.
Holds an object, e.g. Player. remove should be o(1) so maybe hashmap?
No duplicate keys
direct get() is never used, mainly iterating to retrieve data.`
I don't worry about memory, I just want the fastest speed possible even if it's at the cost of memory.

For iteration, nothing is faster than a plain old array. Entries are saved sequentially in memory, so the JVM can get to the next entry simply by adding the length of one entry to the its address.
Arrays are typically a bit of a hassle to deal with compared to maps or lists (e.g: no dictionary-style lookups, fixed length). However, in your case I think it makes sense to go with a one or two dimensional array since the length of the array will not change and dictionary-style lookups are not needed.

So if I understand you correctly you want to have a two-dimensional grid that holds information of which, if any, player is in specific tiles? To me it doesn't sound like you should be removing, or adding things to the grid. I would simply use a two-dimensional array that holds type Player or something similar. Then if no player is in a tile you can set that position to null, or some static value like Player.none() or Tile.empty() or however you'd want to implement it. Either way, a simple two-dimensional array should work fine. :)

The best Collection for your case is a LinkedList. Linked lists will allow for fast iteration, and fast removal and addition at any place in the linked list. For example, if you use an ArrayList, and you can to insert something at index i, then you have to move all the elements from i to the end one entry to the right. The same would happen if you want to remove. In a linked list you can add and remove in constant time.
Since you need two dimensions, you can use linked lists inside of linked lists:
List<List<Tile> players = new LinkedList<List<Tile>>(20);
for (int i = 0; i < 20; ++i){
List<Tile> tiles = new LinkedList<Tile>(20);
for (int j = 0; j < 20; ++j){
tiles.add(new Tile());
}
players.add(tiles);
}

use a map of sets guarantee O(1) for vertices lookup and amortized O(1) complexity edge insertion and deletions.
HashMap<VertexT, HashSet<EdgeT>> incidenceMap;

There is no simple one-size-fits-all solution to this.
For example, if you only want to append, iterate and use Iterator.remove(), there are two obvious options: ArrayList and LinkedList
ArrayList uses less memory, but Iterator.remove() is O(N)
LinkedList uses more memory, but Iterator.remove() is O(1)
If you also want to do fast lookup; (e.g. Collection.contains tests), or removal using Collection.remove, then HashSet is going to be better ... if the collections are likely to be large. A HashSet won't allow you to put an object into the collection multiple times, but that could be an advantage. It also uses more memory than either ArrayList or LinkedList.
If you were more specific on the properties required, and what you are optimizing for (speed, memory use, both?) then we could give you better advice.

The requirement of not allowing duplicates is effectively adding a requirement for efficient get().
Your options are either hash-based, or O(Log(N)). Most likely, hashcode will be faster, unless for whatever reason, calling hashCode() + equals() once is much slower than calling compareTo() Log(N) times. This could be, for instance, if you're dealing with very long strings. Log(N) is not very much, by the way: Log(1,000,000,000) ~= 30.
If you want to use a hash-based data structure, then HashSet is your friend. Make sure that Player has a good fast implementation of hashCode(). If you know the number of entries ahead of time, specify the HashSet size. ( ceil(N/load_factor)+1. The default load factor is 0.75).
If you want to use a sort-based structure, implement an efficient Player.compareTo(). Your choices are TreeSet, or Skip List. They're pretty comparable in terms of characteristics. TreeSet is nice in that it's available out of the box in the JDK, whereas only a concurrent SkipList is available. Both need to be rebalanced as you add data, which may take time, and I don't know how to predict which will be better.

Is it better to use a TreeSet or ArrayList when using a custom comparator

I have implemented a graph.
I want to sort a given subset of vertices with respect to their degrees.
Therefore, I've written a custom comparator named DegreeComparator.
private class DegreeComparator implements Comparator<Integer>
{
#Override
public int compare(Integer arg0, Integer arg1)
{
if(adj[arg1].size() == adj[arg0].size()) return arg1 - arg0;
else return adj[arg1].size() - adj[arg0].size());
}
}
So, which one of the below is more efficient?
Using TreeSet
public Collection<Integer> sort(Collection<Integer> unsorted)
{
Set<Integer> sorted = new TreeSet<Integer>(new DegreeComparator());
sorted.addAll(unsorted);
return sorted;
}
Using ArrayList
Collections.sort(unsorted, new DegreeComparator());
Notice that the second approach is not a function, but a one-line code.
Intuitively, I'd rather choose the second one. But I'm not sure if it is more efficient.

Java API contains numerous Collection and Map implementations so it might be confusing to figure out which one to use. Here is a quick flowchart that might help with choosing from the most common implementations

A TreeSet is a Set. It removes duplicates (elements with the same degree). So both aren't equivalent.
Anyway, if what you want naturally is a sorted list, then sort the list. This will work whether the collection has duplicates or not, and even if it has the same complexity (O(n*log(n)) as populating a TreeSet, it is probably faster (because it just has to move elements in an array, instead of having to create lots of tree nodes).

If you only sort once, then the ArrayList is an obvious winner. The TreeSet is better if you add or remove items often as sorting a list again and again would be slow.
Note also that all tree structures need more memory and memory access indirection which makes them slower.
If case of medium sized lists, which change rather frequently by a single element, the fastest solution might be using ArrayList and inserting into the proper position (obviously assuming the arrays get sorted initially).
You'd need to determine the insert position via Arrays.binarySearch and insert or remove. Actually, I would't do it, unless the performance were really critical and a benchmark would show it helps. It gets slow when the list get really big and the gain is limited as Java uses TimSort, which is optimized for such a case.
As pointed in a comment, assuring that the Comparator returns different values is sometimes non-trivial. Fortunately, there's Guava's Ordering#arbitrary, which solves the problem if you don't need to be compatible with equals. In case you do, a similar method can be written (I'm sure I could find it somewhere if requested).

List vs. Map: Which takes less space and more efficient?

I have two classes Foo and Bar.
class Foo
{
Set<Integer> bars; // Foo objects have collection of bars.
Set<Integer> adjacents; // Adjacency list of Foos.
}
class Bar
{
int foo; // ID of foo of which this object belongs to
Ipsum ipsum; // This an arbitrary class. But it must be present
Map<Integer, Float> adjacents; // Adjacency list of Bars
}
Number of Bars are predefined (up to 1000). Hence, I may use an array.
But number of Foos are undefined (at most #ofBars/4).
When you consider addition, deletion and get(), I need the one which is faster and takes less space (because I'm going to use serialization).
Here are my options (as far as I have thought)
Option 1: Don't define a class for Foo. Instead, use List<Set<Integer>> foo; and another map for Map> fooAdjacencies;
Option 2: Use Map<Integer, Set<Integer> foo if I want to get bars of i, I simply write foo.get(i).
Option 3: Dont define classes. Instead, use option 2 and for Bar class:
Map<Integer, Ipsum> bar;
Map<Integer, Map<Integer, Floar>> barAdjacencies;
Which option should I choose in terms of space and time efficiency?

This sounds like it'd be very helpful for you (specifically the Data Structures section): http://bigocheatsheet.com/
You say
I need my structure to be efficient while adding, removing and finding elements. No other behavior.
The problem is that Lists and Maps are usually used in totally different cases. Their names describe their use cases fairly well -- you use a List if you need to list something (probably in some sequential order), while a Map would be used if you need to map an input to an output. You can use a Map as a List by mapping Integers to your elements, but that's overcomplicating things a bit. However, even within List and Map you can have different implementations that differ wildly in asymptotic performance.
With few exceptions, data structures will take O(n) space, which makes sense. If memory serves, anything other than an ArrayList (or other collections backed only by a primitive array) will have a decent amount of space overhead as they use other objects (e.g. Nodes for LinkedLists and Entry objects for Maps) to organize the underlying structure. I wouldn't worry too much about this overhead though unless space really is at a premium.
For best-performance addition, deletion, and search, you want to look at how the data structure is implemented.
LinkedList-style implementation will net you O(1) addition and deletion (and with a good constant factor, too!), but will have a pretty expensive get() with O(n) time, because the list will have to be traversed every time you want to get something. Java's LinkedList implementation, though, removes in O(n) time; while the actual act of deletion is O(1), that's only if you have a reference to the actual node that you're removing. Because you don't, removals in Java's LinkedList are O(n) -- O(n) for searching for the node to remove, and O(1) for removal.
Data structures backed with a plain array will have O(1) get() because it's an array, but takes O(n) to add, and delete, because any addition/deletion other than at the last element requires all other elements to be shuffled (in Java's implementation at least). Searching for something using an object instead of an index is done in O(n) time because you have to iterate over the array to find the object.
The following two structures are usually Maps, and so usually require you to implement equals() (and hashCode() for HashMaps):
Data structures backed by a tree (e.g. TreeMap) will have amortized (I think) O(lg n) add/remove, as a good implementation should be self-balancing, making worst-case addition/deletions only have to go through the height of the tree at most. get() operations are O(lg n). Using a tree requires that your elements be sortable/comparable in some way, which could be a bonus or hinderance, depending on your usage.
Hash-based data structures have amortized (average) O(1) everything, albeit with a slightly higher constant factor due to the overhead of hashing (and following any chains if the hash spread is poor). HashMaps could start sucking if you write a bad hashCode() function, though, so you want to be careful with that, although the implementers of Java's HashMap did do some magic behind the scenes to try to at least partially negate the effect of bad hashCode() implementations.
Hope that rundown helped. If you clear up how your program is structured, I might be able to give a recommendation. Until then, the best I can do is show you the options and let you pick.

I find this problem description a little hard to follow, but I think you're just looking for general collections/data structures advice.
A list (say, an array list) easily allows you to add and iterate over elements. When it is expanded beyond the size of the underlying array, a one-off costly resize operation is executed to add more space; but that is fine because it happens rarely and the amortized time is not bad. Searching for a specific element in a list is slow because you need to traverse it in order; there is no implied ordering in most lists. Deleting elements depends on the underlying list implementation. An array list could be slow in this regard; but I'm guessing that they optimized it just by marking the underlying element as deleted and skipping it during iteration. When using lists you also have to consider where you are adding elements. Linked lists are slower to iterate but can easily add and remove elements at any position. Array lists cannot easily add an element anywhere but the end.
Per your requirements, if you are required to execute a "get" or find on an element, then you need some kind of searching functionality to speed it up. This would make a map better as you can locate elements in log(n) time instead of linear time as when searching an unordered list. Adding and removing elements in a list is also relatively fast, so that's probably your best option.
Most importantly, implement it more than one way and profile it yourself to learn more :) Lists are rarely a good choice when searching is required though.

Best way to remove and add elements from the java List

I have 100,000 objects in the list .I want to remove few elements from the list based on condition.Can anyone tell me what is the best approach to achieve interms of memory and performance.
Same question for adding objects also based on condition.
Thanks in Advance
Raju

Your container is not just a List. List is an interface that can be implemented by, for example ArrayList and LinkedList. The performance will depend on which of these underlying classes is actually instantiated for the object you are polymorphically referring to as List.
ArrayList can access elements in the middle of the list quickly, but if you delete one of them you need to shift a whole bunch of elements. LinkedList is the opposite i nthis respect., requiring iteration for the access but deletion is just a matter of reassigning pointers.
Your performance depends on the implementation of List, and the best choice of implementation depends on how you will be using the List and which operations are most frequent.

If you're going to be iterating a list and applying tests to each element, then a LinkedList will be most efficient in terms of CPU time, because you don't have to shift any elements in the list. It will, however consume more memory than an ArrayList, because each list element is actually held in an entry.
However, it might not matter. 100,000 is a small number, and if you aren't removing a lot of elements the cost to shift an ArrayList will be low. And if you are removing a lot of elements, it's probably better to restructure as a copy-with filter.
However, the only real way to know is to write the code and benchmark it.

Collections2.filter (from Guava) produces a filtered collection based on a predicate.
List<Number> myNumbers = Arrays.asList(Integer.valueOf(1), Double.valueOf(1e6));
Collection<Number> bigNumbers = Collections2.filter(
myNumbers,
new Predicate<Number>() {
public boolean apply(Number n) {
return n.doubleValue() >= 100d;
}
});
Note, that some operations like size() are not efficient with this scheme. If you tend to follow Josh Bloch's advice and prefer isEmpty() and iterators to unnecessary size() checks, then this shouldn't bite you in practice.

LinkedList could be a good choice.
LinkedList does "remove and add elements" more effective than ArrayList. and no need to call such method as ArrayList.trimToSize() to remove useless memory. But LinkedList is a dual-linked list, each element is wrapped as an Entry which needs extra memory.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.