Java: How to get n elements from a set

Java: How to get n elements from a set - java

I was trying to find the most elegant way to get the n elements from a set starting from x. What I concluded was using streams:
Set<T> s;
Set<T> subS = s.stream().skip(x).limit(n).collect(Collectors.toSet());
Is this the best way to do it this way? Are there any drawbacks?

Similar to Steve Kuo's answer but also skipping the first x elements:
Iterables.limit(Iterables.skip(s, x), n);
Guava Iterables

Use Guava, Iterables.limit(s, 20).

Your code doesn’t work.
Set<T,C> s;
Set<T,C> subS = s.stream().skip(x).limit(n).collect(Collectors.toSet());
What is Set<T,C>? A Set contains elements of a given type so what are the two type parameters supposed to mean?
Further, if you have a Set<T>, you don’t have a defined order. “the n elements from a set starting from x” makes no sense in the context of a Set. There are some specialized Set implementations which have an order, e.g. are sorted or do retain insertion order, but since your code doesn’t declare such prerequisite but seems to be supposed to work on an arbitrary Set, it must be considered broken.
If you want to process a fraction of the Set according to an order, you have to freeze the order first:
Set<T> s;
List<T> frozenOrder=new ArrayList<>(s);
The list will have an order which will be the order of the Set, if there is any, or an arbitrary order, fixed at the creation time of the ArrayList, which will not change afterwards.
Then, extracting a fragment of it, is easy:
List<T> sub=frozenOrder.subList(x, Math.min(s.size(), x+n));
You may also convert it back to a Set, if you wish:
Set<T> subSet=new HashSet<>(sub);
That said, it’s rather unusual to process a part of a Set given by positional numbers.

The use of Stream is fine. The one drawback I can see is not all implementation of Set is ordered e.g. HashSet is not ordered but LinkedHashSet is. SO you might get different resulting set on different run.

You can just iterate over set and collect first n elements:
int n = 0;
Iterator<T> iter = set.iterator();
while (n < 8 && iter.hasNext()) {
T t = iter.next();
list.add(t);
n++;
}
The benefit is that it should be faster than more generic solutions.
The drawback is that it's more verbose than the one that you suggested.

A set - in its original manner - is not intended to have ordered elements, so you can not start from element x. SortedSet may be the "set" you want to use.
I'd convert it to a List first, like
new ArrayList(s).subList(<index of x>, <index of x + n>);
but it may have a very bad impact on performance. In this case the ArrayList would have to be stored to retrive the next subList because there is no explicit order, and the implicit order may change the next time new ArrayList(s) is called.

First, a set is not made for getting specific elements of it -
you should use a sortedSet or a ArrayList instead.
But if you have to get the elements of the set, you can use the following code
to iterate over the set:
int c = 0;
int n = 50; //Number of elements to get
Iterator<T> iter = set.iterator();
while (c<n && iter.hasNext()) {
T t = iter.next();
list.add(t);
c++;
}

Related

How to add element to a HashSet while iterating this HashSet?

I have a use case like following:
SET is a Set of Integer with size N
for i in SET (I mean only iterate the Set of size N at start point):
if i + 7 not in SET:
SET.add(i + 7)
return SET
How to implement this using Java HashSet except using an auxiliary list/set to store the element which needs to be inserted?

It is impossible to add something to the Set instance while iterating over its contents; when using the foreach loop (for( var e : set ) notation), no modification is allowed, while when using an explicit iterator (for( var i = set.iterator(); i.hasNext(); ) … notation), you can call i.remove() to get rid of the current element. But adding new elements does still not work in this case.
This behaviour is shared by all Java Collection classes, although List knows a special iterator class, ListIterator, that also allows adding entries (notation for( var i = list.listIterator(); i.hasNext(); ) …) by calling i.add() – thanks to #lucasvw for reminding me on that.

#lucasvw eluded to the underlying issue here - you need to somehow differentiate between the original values and the values you've added, otherwise, the loop will run indefinitely (or, at least, until the values overflow enough so they start repeating themselves).
The best way to do this is to indeed have an auxiliary set to hold all the values you want to add:
Set<Integer> aux = original.stream().map(i -> i + 7).collect(Collectors.toSet());
original.addAll(aux);

If you do not want to make a copy yourself, Java can do that for you: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CopyOnWriteArraySet.html. It will not be faster or anything magical, but you can iterate "the original" and modify at the same time.
However if you want something efficient, that is presumably to create another Set with the new elements, and addAll() them at the end. Depending on the size of the set it may be faster to skip the containment-check, and leave it for the merging.
BitSet and its or() operation may be also something to look at if your numbers are nonnegative and are of low magnitude.

Best way to remove one arraylist elements from another arraylist

What is the best performance method in Java (7,8) to eliminate integer elements of one Arraylist from another. All the elements are unique in the first and second lists.
At the moment I know the API method removeall and use it this way:
tempList.removeAll(tempList2);
The problem appears when I operate with arraylists have more than 10000 elements. For example when I remove 65000 elements, the delay appears to be about 2 seconds. But I need to opperate with even more large lists with more than 1000000 elements.
What is the strategy for this issue?
Maybe something with new Stream API should solve it?

tl;dr:
Keep it simple. Use
list.removeAll(new HashSet<T>(listOfElementsToRemove));
instead.
As Eran already mentioned in his answer: The low performance stems from the fact that the pseudocode of a generic removeAll implementation is
public boolean removeAll(Collection<?> c) {
for (each element e of this) {
if (c.contains(e)) {
this.remove(e);
}
}
}
So the contains call that is done on the list of elements to remove will cause the O(n*k) performance (where n is the number of elements to remove, and k is the number of elements in the list that the method is called on).
Naively, one could imagine that the this.remove(e) call on a List might also have O(k), and this implementation would also have quadratic complexity. But this is not the case: You mentioned that the lists are specifically ArrayList instances. And the ArrayList#removeAll method is implemented to delegate to a method called batchRemove that directly operates on the underlying array, and does not remove the elements individually.
So all you have to do is to make sure that the lookup in the collection that contains the elements to remove is fast - preferably O(1). This can be achieved by putting these elements into a Set. In the end, it can just be written as
list.removeAll(new HashSet<T>(listOfElementsToRemove));
Side notes:
The answer by Eran has IMHO two major drawbacks: First of all, it requires sorting the lists, which is O(n*logn) - and it's simply not necessary. But more importantly (and obviously) : Sorting will likely change the order of the elements! What if this is simply not desired?
Remotely related: There are some other subtleties involved in the removeAll implementations. For example, HashSet removeAll method is surprisingly slow in some cases. Although this also boils down to the O(n*n) when the elements to be removed are stored in a list, the exact behavior may indeed be surprising in this particular case.

Well, since removeAll checks for each element of tempList whether it appears in tempList2, the running time is proportional to the size of the first list multiplied by the size of the second list, which means O(N^2) unless one of the two lists is very small and can be considered as "constant size".
If, on the other hand, you pre-sort the lists, and then iterate over both lists with a single iteration (similar to the merge step in merge sort), the sorting will take O(NlogN) and the iteration O(N), giving you a total running time of O(NlogN). Here N is the size of the larger of the two lists.
If you can replace the lists by a sorted structure (perhaps a TreeSet, since you said the elements are unique), you can implement removeAll in linear time, since you won't have to do any sorting.
I haven't tested it, but something like this can work (assuming both tempList and tempList2 are sorted) :
Iterator<Integer> iter1 = tempList.iterator();
Iterator<Integer> iter2 = tempList2.iterator();
Integer current = null;
Integer current2 = null;
boolean advance = true;
while (iter1.hasNext() && iter2.hasNext()) {
if (advance) {
current = iter1.next();
advance = false;
}
if (current2 == null || current > current2) {
current2 = iter2.next();
}
if (current <= current2) {
advance = true;
if (current == current2)
iter1.remove();
}
}

I suspect removing from an ArrayList, is a perfromance hit since the list may either be divided when an element in the middle is removed, or if the list must be compacted after an element is removed. It may be faster to do this:
Create 'Set' of the elements to be removed
Create a new result ArrayList that you need, call it R. You can give it enough size at construction.
Iterate thru the original list you need elements from it removed, if the element is found in the Set, don't add it to R, otherwise add it.
This should have O(N); if creating the Set and a lookup in it is assumed constant.

Extract first k elements from a Set efficiently

Problem
I'm writing a simple Java program in which I have a TreeSet which contains Comparable elements (it's a class that I've written myself). In a specific moment I need to take only the first k elements from it.
What I've done
Currently I've found two different solution for my problem:
Using a simple method written by me; It copies the first k elements from the initial TreeSet;
Use Google Guava greatestOf method.
For the second option you need to call the method in this way:
Ordering.natural().greatestOf(mySet, 80))
But I think that it's useless to use this kind of invocation because the elements are already sorted. Am I wrong?
Question
I want to ask here which is a correct and, at the same time, efficient method to obtain a Collection derived class which contains the first k elements of a TreeSet?
Additional information
Java version: >= 7

You could use Guava's Iterables#limit:
ImmutableList.copyOf(Iterables.limit(yourSet, 7))
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Iterables.html#limit(java.lang.Iterable, int)

I would suggest you to use a TreeSet<YourComparableClass> collection, it seems to be the solution you are looking for.
A TreeSet can return you an iterator, and you can simply iterates K times, by storing the objects the iterator returns you: the elements will be returned you in order.
Moreover a TreeSet keep your elements always sorted: at any time, when you add or remove elements, they are inserted and removed so that the structure remains ordered.
Here a possible example:
public static ArrayList<YourComparableClass> getFirstK(TreeSet<YourComparableClass> set, int k) {
Iterator<YourComparableClass> iterator = set.iterator();
ArrayList<YourComparableClass> result = new ArrayList<>(k); //to store first K items
for (int i=0;i<k;i++) result.add(iterator.next()); //iterator returns items in order
//you should also check iterator.hasNext(); if you are not sure to have always a K<set.size()
return result;
}

The descendingIterator() method of java.util.TreeSet yields elements from greatest to least, so you can just step it however many times, inserting the elements into a collection. The running time is O(log n + k) where k is the number of elements returned, which is surely fast enough.
If you're using a HashSet, on the other hand, then the elements in fact are not sorted, so you need to use the linear-time selection method that you indicated.

Java, multiple iterators on a set, removing proper subsets and ConcurrentModificationException

I have a set A = {(1,2), (1,2,3), (2,3,4), (3,4), (1)}
I want to turn it into A={(1,2,3), (2,3,4)}, remove proper subsets from this set.
I'm using a HashSet to implement the set, 2 iterator to run through the set and check all pairs for proper subset condition using containsAll(c), and the remove() method to remove proper subsets.
the code looks something like this:
HashSet<Integer> hs....
Set<Integer> c=hs.values();
Iterator<Integer> it= c.iterator();
while(it.hasNext())
{
p=it.next();
Iterator<Integer> it2= c.iterator();
while(it2.hasNext())
{
q=it2.next();
if q is a subset of p
it2.remove();
else if p is a subset of q
{
it.remove();
break;
}
}
}
I get a ConcurrentModificationException the 1st time i come out of the inner while loop and do a
p=it.next();
The exception is for when modifying the Collection while iterating over it. But that's what .remove() is for.
I have used remove() when using just 1 iterator and encountered no problems there.
If the exception is because I'm removing an element from 'c' or 'hs' while iterating over it, then the exception should be thrown when it encounter the very next it 2 .next() command, but I don't see it then. I see it when it encounters the it.next() command.
I used the debugger, and the collections and iterators are in perfect order after the element has been removed. They contain and point to the proper updated set and element. it.next() contains the next element to be analyzed, it's not a deleted element.
Any ideas over how i can do what i'm trying to do without making a copy of the hashset itself and using it as an intermediate before I commit updates?
Thank you

You can't modify the collection with it2 and continue iterating it with it. Just as the exception says, it's concurrent modification, and it's not supported.
I'm afraid you're stuck with an intermediate collection.
Edit
Actually, your code doesn't seem you make sense: are you sure it's a collection of Integer and not of Set<Integer>? In your code p and q are Integers, so "if q is a subset of p" doesn't seem to make too much sense.
One obvious way to make this a little smarter: sort your sets by size first, as you go from largest to smallest, add the ones you want to keep to a new list. You only have to check each set against the keep list, not the whole original collection.

The idea behind the ConcurrentModificationException is to maintain the internal state of the iterators. When you add or delete things from a set of items, it will throw an exception even if nothing appears wrong. This is to save you from coding errors that would end up throwing a NullPointerException in otherwise mundane code. Unless you have very tight space constraints or have an extremely large collection, you should just make a working copy that you can add and delete from without worry.

How about creating another set subsetNeedRemoved containing all subsets you are going to remove? For each subset, if there is a proper superset, add the subset to subsetNeedRemoved. At the end, you can loop over subsetNeedRemoved and remove corresponding subsets in the original set.

I'd write something like this...
PriorityQueue<Set<Integer>> queue = new PriorityQueue<Set<Integer>>(16,
new Comparator<Set<Integer>>() {
public int compare(Set<Integer> a, Set<Integer> b) {
return b.size() - a.size(); // overflow-safe!
}
});
queue.addAll(sets); // we'll extract them in order from largest to smallest
List<Set<Integer>> result = new ArrayList<>();
while(!queue.isEmpty()) {
Set<Integer> largest = queue.poll();
result.add(largest);
Iterator<Set<Integer>> rest = queue.iterator();
while(rest.hasNext()) {
if(largest.containsAll(rest.next())) {
rest.remove();
}
}
}
Yeah, it consumes some extra memory, but it's idiomatic, straightforward, and possibly faster than another approach.

Sampling with no replacement in Java from an ArrayList

I have an arrayList with 30 elements. I'd like to create many sublists of 15 elements from this list. What's the efficient way of doing so?
Right now I clone the ArrayList and use remove(random) to do it, but I am sure this is too clumsy. What should I do instead?
Does Java have a "sample" function like in R?
Clarification: by sampling with no replacement I mean take at random 15 unique elements from the 30 available in the original list. Moreover I want to be able to do this repeatedly.

Use the Collections#shuffle method to shuffle your original list, and return a list with the first 15 elements.

Consider creating new list and adding random elements from current list instead of copying all elements and removing them.
Another way to do this is to create some kind of View on top of the current list.
Implement an Iterator interface that randomly generates index of element during next operation and retrieves element by index from current list.

No, Java does not have a sample function like in R. However, it is possible to write such a function:
// Samples n elements from original, and returns that list
public <T> static List<T> sample(List<T> original, int n) {
List<T> result = new ArrayList<T>(n);
for (int i = 0; i < original.size(); i++) {
if (result.size() == n)
return result;
if ((n - result.size()) >= (original.size() - i)) {
result.add(original.get(i));
} else if (Math.random() < ((double)n / original.size())) {
result.add(original.get(i));
}
}
return result;
}
This function iterates through original, and copies the current element to result based on a random number, unless we are near enough to the end of original to require copying all the remaining elements (the second if statement in the loop).

This is a basic combinatorics problem. You have 30 elements in your list, and you want to choose 15. If the order matters, you want a permutation, if it doesn't matter, you want a combination.
There are various Java combinatorics samples on the web, and they typically use combinadics. I don't know of any ready made Java libraries, but Apache Math Commons has binomial coefficient support to help you implement combinadics if you go that route. Once you have a sequence of 15 indices from 0 to 29, I'd suggest creating a read-only iterator that you can read the elements from. That way you won't have to create any new lists or copy any references.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: How to get n elements from a set - java

I was trying to find the most elegant way to get the n elements from a set starting from x. What I concluded was using streams: Set<T> s; Set<T> subS = s.stream().skip(x).limit(n).collect(Collectors.toSet()); Is this the best way to do it this way? Are there any drawbacks?

Similar to Steve Kuo's answer but also skipping the first x elements: Iterables.limit(Iterables.skip(s, x), n); Guava Iterables

Use Guava, Iterables.limit(s, 20).

The use of Stream is fine. The one drawback I can see is not all implementation of Set is ordered e.g. HashSet is not ordered but LinkedHashSet is. SO you might get different resulting set on different run.

Related

How to add element to a HashSet while iterating this HashSet?

Best way to remove one arraylist elements from another arraylist

Extract first k elements from a Set efficiently

Java, multiple iterators on a set, removing proper subsets and ConcurrentModificationException

Sampling with no replacement in Java from an ArrayList

Categories

Resources