List with fast contains

List with fast contains - java

I wonder if there's a List implementation allowing fast contains. I'm working with quite a long List and I can't switch to Set since I need the access by index. I can ignore the performance, which may be acceptable now and may or may not be acceptable in the future. I can create a HashSet and do all modifying operations on both, but doing it manually is quite boring and error prone.
I know that it's impossible to have a class working like both List and Set (because of the different equals semantics), but I wonder if there's List implementing RandomAccess and employing an HashSet for speeding up contains.

I know that it's impossible to have a class working like both List and Set
Have you tried LinkedHashSet? Technically it's a set but it preserves order which might be just enough for you. However access by index is linear and not built-in.
Other approach would be to wrap List with a custom decorator that both delegates to List and maintains a n internalSet for faster contains.

you can wrap a list and hashSet that combines best of both worlds
public class FastContainsList<T> extends AbstractSequentialList<T> implements RandomAccess{
//extending sequential because it bases itself of the ListIterator(int) and size() implementation
private List<T> list=new ArrayList<T>();
private Set<T> set=new HashSet<T>();
public int size(){
return list.size();
}
public boolean contains(Object o){//what it's about
return set.contains(o);
}
public ListIterator<T> listIterator(int i){
return new ConIterator(list.listIterator(i));
}
/*for iterator()*/
private ConIterator implements ListIterator<T>{
T obj;
ListIterator<T> it;
private ConIterator(ListIterator<T> it){
this.it = it
}
public T next(){
return obj=it.next();
}
public T previous(){
return obj=it.previous();
}
public void remove(){
it.remove();//remove from both
set.remove(obj);
}
public void set(T t){
it.set(t);
set.remove(obj);
set.add(obj=t);
}
public void add(T t){
it.add(t);
set.add(t);
}
//hasNext and hasPrevious + indexes still to be forwarded to it
}
}

What about a BiMap<Integer, MyClass>? This can be found in Guava
BiMap<Integer, MyClass> map = HashBiMap.create();
//store by index
map.put(1, myObj1);
//get by index
MyClass retrievedObj = map.get(1);
//check if in map
if ( map.containsValue(retrievedObj) ) {
//...
}
I know it doesn't implement the List interface. The major limitation here is that insertion and removal are not provided in the traditional List sense; but you didn't specifically say whether those were important to you.

Apache Commons TreeList should do the trick:
http://commons.apache.org/proper/commons-collections/javadocs/api-release/org/apache/commons/collections4/list/TreeList.html

You can maintain a List and Set. This will give you fast indexed lookup and contains (with a small overhead)
BTW: If your list is small, e.g. 10 entries, it may not make any difference to use a plain List.

I think a class that wraps a HashMap and List (as you said in your post) is probably the best best for fast contains and access.

Related

How to build Priority Queue with customized comparator in linear time

In the constructor of PriorityQueue, we can pass in a collection like List or Set, which builds the PriorityQueue in linear time.
However, this also means the PriorityQueue will use a default Comparator.
I want to use my own comparator, so I can have something else other than a min heap.
The only way I can think of is to wrap the collection in a SortedSet and put a customized comparator in it.
Is there any other good way to do this?

Assume you have class A (or a pojo)
with an int priority; field which holds your priority for this object and its getter getPriority()
then you have it something like this:
Queue<A> queue = new PriorityQueue<>(
4 //initialCapacity
, new Comparator<A>() {
public int compare(A p1, A p2) {
return Integer.valueOf(p1.getPriority()).compareTo(p2.getPriority());
}
});

Create proxy class that contains your data object and implements Comparable interface. Create list of such objects, pass it to PriorityQueue constructor.
I don't know of effective SortedSet implementations with garanteed creation time of O(n) for comparable objects. It is possible to sort array in O(n) for radix-friendly key though (in reality linear sort tends to be not-so-fast in general case), so you can make customized SortedSet with fast creation compatible to your special comparators.
Heap constructor for comparable objects can do it in O(n) only because it does not fully sort the list.

In the constructor of PriorityQueue, we can pass in a collection like List or Set, which builds the PriorityQueue in linear time.
Wrong.
However, this also means the PriorityQueue will use a default Comparator.
Wrong.
The javadoc says
If the specified collection is an instance of a SortedSet or is another PriorityQueue, this priority queue will be ordered according to the same ordering.
So when starting from a recognized sorted collection, you get its Comparator. Moreover, you get linear time.
Otherwise, you don't. The source shows it rather clearly (look for heapify()).
If you have an unsorted list, there's no way to obtain a priority queue in linear time (unless the priority queue is ensuring the heap property lazily; but that's cheating).

I have the same problem.
The only thing that I think is create a wrapper class that contains an object T and implements Comparable interface like this:
class ModifiedPriorityQueue<T> extends PriorityQueue<Wrapper<T>> {
public ModifiedPriorityQueue(Collection<T> collection, Comparator<T> comparator) {
super(collection.stream().map(x -> new Wrapper<>(x, comparator)).collect(Collectors.toList()));
}
}
class Wrapper<T> implements Comparable<Wrapper<T>> {
private final T object;
private Comparator<T> comparator;
public Wrapper(T object, Comparator<T> comparator) {
this.object = object;
this.comparator = comparator;
}
#Override
public int compareTo(Wrapper<T> o) {
return comparator.compare(object, o.object);
}
#Override
public String toString() {
return object.toString();
}
}
class Main {
public static void main(String[] args) {
Collection<Integer> elements = Arrays.asList(1, 2, 3, 4);
ModifiedPriorityQueue<Integer> p = new ModifiedPriorityQueue<>(elements, Comparator.reverseOrder());
while (!p.isEmpty()) {
System.out.println(p.poll());
}
}
}

Adavantages of HashSet over ArrayList and vice versa

I have a doubt regarding data structures in Java. While solving a typical hashing problem in Java, I was using the HashSet data structure, which worked fine until there were duplicate objects (object contents). Since HashSet does not support insert of duplicates, my logic was failing.
I replaced the hashset with the typical Arraylist since the methods of hashset such as .add(), .contains(), .remove() are supported in both, and my logic worked perfectly then.
But does this necessarily mean ArrayList is the logical choice over Hashset when duplicates are involved? There should be some time complexity advantages of Hashset over ArrayList right? Can someone please provide me some insight regarding this?
EDIT: What would be the ideal data structure when you want to do hashing when duplicates are involved. I mean when the duplicates should not be ignored and should be inserted.

It's not clear what you mean by a "hashing problem," but maybe you're looking for a multiset. From the Guava docs:
A collection that supports order-independent equality, like Set, but may have duplicate elements. A multiset is also sometimes called a bag.
Elements of a multiset that are equal to one another are referred to as occurrences of the same single element. The total number of occurrences of an element in a multiset is called the count of that element (the terms "frequency" and "multiplicity" are equivalent, but not used in this API).
No such thing exists in the JDK.

When you use a HashMap it replaces the original value with the new duplicate.
When you use a HashSet, subsequent duplicates are ignored (not inserted).
When you use an ArrayList, it simply adds the duplicate to the end of the list
It all depended on what you need given your requirements.

ArrayList is not the logical choice if you don't want duplicates. Different tools for different use cases.
You would use a Set in areas where duplicates wouldn't make sense, for example, a set of students. A List allows duplicates.

If you specifically need a HashSet that handles duplicates, a HashMap will be able to do the job. If you just need a count of the number of objects added (with quick lookup/etc), a HashMap<T,Integer> will be ideal, where T is the type of your object. If you actually need to keep references to the duplicate objects you've added, go with HashMap<T, List<T>>. That way you can look up by using HashMap's .containsKey(T t), and iterate through all of the similarly hashing objects in the resulting list. So for example, you could create this class:
public class HashSetWithDuplicates<T> {
private HashMap<T, List<T>> entries;
private int size;
public HashSetWithDuplicates(){
entries = new HashMap<>();
size = 0;
}
public HashSetWithDuplicates(Collection<? extends T> col){
this();
for(T t : col){
add(t);
}
}
public boolean contains(T t){
return entries.containsKey(t);
}
public List<T> get(T t){
return entries.get(t);
}
public void add(T t){
if (!contains(t)) entries.put(t, new ArrayList<>());
entries.get(t).add(t);
size++;
}
public void remove(T t){
if (!contains(t)) return;
entries.get(t).remove(t);
if(entries.get(t).isEmpty()) entries.remove(t);
size--;
}
public int size(){
return size;
}
public boolean isEmpty(){
return size() == 0;
}
}
Add functionality to your needs.

How can I randomize the iteration sequence of a Set?

I need to use the Set collection.
Each time I start a jvm to run the program, I want to iterate through the items in the Set in a randomly decided sequence.
The iteration sequence has nothing to do with the sequence in which I placed them in the Set, right?
So, what to do? How can I randomize the iteration sequence in a Set?
Here is my method, and it does not randomize.
public static <T> void shuffle(Set<T> set) {
List<T> shuffleMe = new ArrayList<T>(set);
Collections.shuffle(shuffleMe);
set.clear();
set.addAll(shuffleMe);
}

What you need is a RandomizingIterator
Set is unordered, so randomizing an unordered Collection doesn't make any logical sense.
An ordered Set is ordered using a Comparator which means it has a fixed order, you can't shuffle it, that has no meaning as the order is determined by the Comparator or the compare() method.
Set -> List will allow you to shuffle the contents of the List and then use a custom RandomizingIterator to iterate across the Set.
Example Implementation :
Link to Gist on GitHub - TestRandomizingIterator.java
import org.junit.Test;
import javax.annotation.Nonnull;
import java.util.*;
public class TestRandomzingIterator
{
#Test
public void testRandomIteration()
{
final Set<String> set = new HashSet<String>()
{
/** Every call to iterator() will give a possibly unique iteration order, or not */
#Nonnull
#Override
public Iterator<String> iterator()
{
return new RandomizingIterator<String>(super.iterator());
}
class RandomizingIterator<T> implements Iterator<T>
{
final Iterator<T> iterator;
private RandomizingIterator(#Nonnull final Iterator<T> iterator)
{
List<T> list = new ArrayList<T>();
while(iterator.hasNext())
{
list.add(iterator.next());
}
Collections.shuffle(list);
this.iterator = list.iterator();
}
#Override
public boolean hasNext()
{
return this.iterator.hasNext();
}
#Override
public T next()
{
return this.iterator.next();
}
/**
* Modifying this makes no logical sense, so for simplicity sake, this implementation is Immutable.
* It could be done, but with added complexity.
*/
#Override
public void remove()
{
throw new UnsupportedOperationException("TestRandomzingIterator.RandomizingIterator.remove");
}
}
};
set.addAll(Arrays.asList("A", "B", "C"));
final Iterator<String> iterator = set.iterator();
while (iterator.hasNext())
{
System.out.println(iterator.next());
}
}
}
Notes:
This is a straw man example, but the intention is clear, use a custom Iterator to get custom iteration.
You can't get the normal iteration behavior back, but that doesn't seem to be a problem with your use case.
Passing the the super.iterator() to the facade is important, it will StackOverflowError otherwise, because it becomes a recursive call if you pass this to .addAll() or the List() constructor.
HashSet may appear to be ordered but it isn't guaranteed to stay ordered, the order depends on the hashCode of the objects and adding a single object may reorder the how the contents are order, the contract of the Set interface is that the order is undefined and in particular the HashSet is nothing more than a Facade over a backing Map.keySet().
There are other more supposedly light weight, but much more complex solutions that use the original Iterator and try and keep track of what has already been seen, those solutions aren't improvements over this technique unless the size of the data is excessively large, and the you are probably looking at on disk structures at that point.

You could copy the contents of the Set into a List, shuffle the List, then return a new LinkedHashSet populated from the shuffled list. Nice thing about LinkedHashSet is that its iterators return elements in the order they were inserted.
public static <T> Set<T> newShuffledSet(Collection<T> collection) {
List<T> shuffleMe = new ArrayList<T>(collection);
Collections.shuffle(shuffleMe);
return new LinkedHashSet<T>(shuffleMe);
}

According to the docs for java.util.Set:
The elements are returned in no particular order (unless this set is an instance of some class that provides a guarantee).
When you insert the elements there is no guarantee about the order they will be returned to you. If you want that behavior you will need to use a data structure which supports stable iteration order, e.g. List.

Internally HashSet sorts all its elements, AFAIR according to their hash() value. So you should use other classes like SortedSet with a custom comparator. But remember the whole idea of Set is to find elements quickly, that's why it sorts elements internally. So you have to keep "stability" of the comparison. Maybe you don't need a set after shuffling?

T[] toArray(T[] a) implementation

I am creating a SortedList class that implements List.
If I understand correctly, the method toArray(T[] a) takes an array of objects as a parameter and returns a sorted array of these objects.
In the java documentation we can read that if the Collection length is greater than the sortedList, a new array is created with the good size, and if the collection length is smaller than the sortedList, the object following the last object of the collection is set to null.
The project I am working on does not let me use null values in the sorted list, so I am implementing the method differently, using a new sortedList and the toArray() method:
public <T> T[] toArray(T[] a)
{
SortedList sort = new SortedList();
for(Object o : a)
{
sort.add(o);
}
return (T[])sort.toArray();
}
Would this be a good way to implement this method or should I expect errors using it like that?
Thank you for your time.

First a recommendation:
If you want SortedList to implement the List interface, it's a good idea to extend AbstractList instead of implementing List directly. AbstractList has already defined many of the necessary methods, including the one you're having problems with. Most List-implementations in the Java platform libraries also extend AbstractList.
If you still want to implement List directly, here is what the method is supposed to do:
Let a be the specified array.
If a is large enough, fill it with the elements from your SortedList (in the correct order) without caring about what was previously in a.
If there's room to spare in a after filling it, set a[size()] = null. Then the user will know where the list ends, unless the list contains null-elements.
If the list doesn't fit in a, create a new array of type T with the same size as the list, and fill the new one instead.
Return the array you filled. If you filled a, return a. If you made a new array, return the new array.
There are two reasons why this method is useful:
The array will not necessarily be of type Object, but of a type T decided by the user (as long as the type is valid).
The user may want to save memory and re-use an array instead of allocating more mamory to make a new one.
Here is how the Java Docs describe the method.

If you are implementing a "SortedList" class, it's probably in your best interest to maintain a sorted list internally, rather than relying on the toArray() method to sort them on the way out. In other words, users of the class may not use the toArray() method, but may instead use listIterator() to return an Iterator that is supposed to iterate over the elements of the list in the proper order.

Are you sure you need to implement List. It is often sufficient just to implement Iterable and Iterator.
public class SortedList<S extends Comparable<S>> implements Iterable<S>, Iterator<S> {
private final Iterator<S> i;
// Iterator version.
public SortedList(Iterator<S> iter, Comparator<S> compare) {
// Roll the whole lot into a TreeSet to sort it.
Set<S> sorted = new TreeSet<S>(compare);
while (iter.hasNext()) {
sorted.add(iter.next());
}
// Use the TreeSet iterator.
i = sorted.iterator();
}
// Provide a default simple comparator.
public SortedList(Iterator<S> iter) {
this(iter, new Comparator<S>() {
public int compare(S p1, S p2) {
return p1.compareTo(p2);
}
});
}
// Also available from an Iterable.
public SortedList(Iterable<S> iter, Comparator<S> compare) {
this(iter.iterator(), compare);
}
// Also available from an Iterable.
public SortedList(Iterable<S> iter) {
this(iter.iterator());
}
// Give them the iterator directly.
public Iterator<S> iterator() {
return i;
}
// Proxy.
public boolean hasNext() {
return i.hasNext();
}
// Proxy.
public S next() {
return i.next();
}
// Proxy.
public void remove() {
i.remove();
}
}
You can then do stuff like:
for ( String s : new SortedList<String>(list) )
which is usually all that you want because TreeSet provides your sortedness for you.

Java: "cons" an item to a list

I have an Item which has a method List<Item> getChildren() (which returns an immutable list) and for each of the items I have, I need to create a list of the item followed by its children.
What's the quickest way to "cons" (in the Lisp/Scheme sense) my item to create a new immutable list? I can certainly do the following, but it seems wrong/wasteful:
public List<Item> getItemAndItsChildren(Item item)
{
if (item.getChildren.isEmpty())
return Collections.singletonList(item);
else
{
// would rather just "return cons(item, item.getChildren())"
// than do the following -- which, although straightforward,
// seems wrong/wasteful.
List<Item> items = new ArrayList<Item>();
items.add(item);
items.addAll(item.getChildren());
return Collections.unmodifiableList(items);
}
}

I'd change my requirements. In most cases, you don't need a List in your interface, an Iterable will do nicely. Here's the method:
public Iterable<Item> getItemWithChildren(Item item) {
return Iterables.unmodifiableIterable(
Iterables.concat(
Collections.singleton(item),
item.getChildren()
)
);
}
and here's the shortened version (with static imports):
return unmodifiableIterable(concat(singleton(item), item.getChildren()));

The ability to create a new immutable list by concatenating a head element to a tail that may be shared between other lists requires a singly-linked list implementation. Java doesn't provide anything like this out of the box, so your ArrayList solution is as good as anything.
It's also going to be relatively efficient, assuming that these lists are short-lived and you don't have tens of thousands of element in the list. If you do, and if this operation is taking a significant portion of your execution time, then implmenting your own single-linked list might be worthwhile.
My one change to improve your existing efficiency: construct the new list with a capacity (1 + size of old list).

You shouldn't need to special case an Item with no children.
public List<Item> getItemAndItsChildren(Item item)
{
List<Item> items = new ArrayList<Item>();
items.add(item);
items.addAll(item.getChildren());
return Collections.unmodifiableList(items);
}
Also, if you are looking to use a language that isn't verbose, then Java is a poor choice. I'm sure you can do what you like in far less code in Groovy and Scala which both run on the JVM. (Not to mention JRuby or Jython.)

It sounds like you're looking for something like a CompositeList, similar to the Apache Commons' CompositeCollection. An implementation could be as naive as this:
public class CompositeList<T> extends AbstractList<T>{
private final List<T> first, second;
public CompositeList(List<T> first, List<T> second) {
this.second = second;
this.first = first;
}
#Override
public T get(int index) {
if ( index < first.size() ) {
return first.get(index);
} else {
return second.get(index - first.size());
}
}
#Override
public int size() {
return first.size() + second.size();
}
}
And you could use it like this:
public List<Item> getItemAndItsChildren(Item item)
{
return Collections.unmodifiableList(
new CompositeList<Item>(Collections.singletonList(item), item.getChildren()) );
}
But there are huge caveats that make such a class difficult to use...the main problem being that the List interface cannot itself mandate that it is unmodifiable. If you are going to use something like this you must ensure that clients of this code never modify the children!

I use these. (using guava's ImmutableList and Iterables)
/** Returns a new ImmutableList with the given element added */
public static <T> ImmutableList<T> add(final Iterable<? extends T> list, final T elem) {
return ImmutableList.copyOf(Iterables.concat(list, Collections.singleton(elem)));
}
/** Returns a new ImmutableList with the given elements added */
public static <T> ImmutableList<T> add(final Iterable<? extends T> list, final Iterable<? extends T> elems) {
return ImmutableList.copyOf(Iterables.concat(list, elems));
}
/** Returns a new ImmutableList with the given element inserted at the given index */
public static <T> ImmutableList<T> add(final List<? extends T> list, final int index, final T elem) {
return ImmutableList.copyOf(Iterables.concat(list.subList(0, index), Collections.singleton(elem), list.subList(index, list.size())));
}
/** Returns a new ImmutableList with the given element inserted at the given index */
public static <T> ImmutableList<T> add(final List<? extends T> list, final int index, final Iterable<?extends T> elems) {
return ImmutableList.copyOf(Iterables.concat(list.subList(0, index), elems, list.subList(index, list.size())));
}
But none of them are efficient.
Example of prepending/consing an item to a list:
ImmutableList<String> letters = ImmutableList.of("a", "b", "c");
add(letters, 0, "d");
For more efficient immutable/persistent collections you should, as #eneveu points out, look at pcollections, although I have no idea what the quality of that library is.

pcollections is a persistent Java collection library you might be interested in. I bookmarked it a while ago, and haven't yet used it, but the project seems relatively active.
If you want to use Guava, you could use the unmodifiable view returned by Lists.asList(E first, E[] rest). It works with arrays, and its primary goal is to simplify the use of var-args methods. But I see no reason you couldn't use it in your case:
public List<Item> getItemAndItsChildren(Item item) {
return Lists.asList(item, item.getChildren().toArray());
}
The List returned is an unmodifiable view, but it may change if the source array is modified. In your case, it's not a problem, since the getChildren() method returns an immutable list. Even if it were mutable, the toArray() method supposedly returns a "safe" array...
If you want to be extra safe, you could do:
public ImmutableList<Item> getItemAndItsChildren(Item item) {
return ImmutableList.copyOf(Lists.asList(item, item.getChildren().toArray()));
}
Note that Lists.asList() avoids un-necessary ArrayList instantiation, since it's a view. Also, ImmutableList.copyOf() would delegate to ImmutableList.of(E element) when the children list is empty (which, similarly to Collections.singletonList(), is space-efficient).

You should instantiate your list with the exact number you will be putting into it to eliminate expansion copies when you add more.
List<Item> items = new ArrayList<Item>();
should be
List<Item> items = new ArrayList<Item>(item.getChildren() + 1);
otherwise what you are doing is about as idiomatic Java as you can get.
Another thing, is you might consider using Guava and its ImmutableList implementation rather than an Collections.unmodifiableList().
Unlike Collections.unmodifiableList(java.util.List), which is a view
of a separate collection that can still change, an instance of
ImmutableList contains its own private data and will never change.
ImmutableList is convenient for public static final lists ("constant
lists") and also lets you easily make a "defensive copy" of a list
provided to your class by a caller.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

List with fast contains - java

Apache Commons TreeList should do the trick: http://commons.apache.org/proper/commons-collections/javadocs/api-release/org/apache/commons/collections4/list/TreeList.html

You can maintain a List and Set. This will give you fast indexed lookup and contains (with a small overhead) BTW: If your list is small, e.g. 10 entries, it may not make any difference to use a plain List.

I think a class that wraps a HashMap and List (as you said in your post) is probably the best best for fast contains and access.

Related

How to build Priority Queue with customized comparator in linear time

Adavantages of HashSet over ArrayList and vice versa

How can I randomize the iteration sequence of a Set?

T[] toArray(T[] a) implementation

Java: "cons" an item to a list

Categories

Resources