Cull all duplicates in a set

Cull all duplicates in a set - java

I'm using Set to isolate the unique values of a List (in this case, I'm getting a set of points):
Set<PVector> pointSet = new LinkedHashSet<PVector>(listToCull);
This will return a set of unique points, but for every item in listToCull, I'd like to test the following: if there is a duplicate, cull all of the duplicate items. In other words, I want pointSet to represent the set of items in listToCull which are already unique (every item in pointSet had no duplicate in listToCull). Any ideas on how to implement?
EDIT - I think my first question needs more clarification. Below is some code which will execute what I'm asking for, but I'd like to know if there is a faster way. Assuming listToCull is a list of PVectors with duplicates:
Set<PVector> pointSet = new LinkedHashSet<PVector>(listToCull);
List<PVector> uniqueItemsInListToCull = new ArrayList<PVector>();
for(PVector pt : pointSet){
int counter=0;
for(PVector ptCheck : listToCull){
if(pt==ptCheck){
counter++;
}
}
if(counter<2){
uniqueItemsInListToCull.add(pt);
}
}
uniqueItemsInListToCull will be different from pointSet. I'd like to do this without loops if possible.

You will have to do some programming yourself: Create two empty sets; on will contain the unique elements, the other the duplicates. Then loop through the elements of listToCull. For each element, check whether it is in the duplicate set. If it is, ignore it. Otherwise, check if it is in the unique element set. If it is, remove it there and add to the duplicates set. Otherwise, add it to the unique elements set.
If your PVector class has a good hashCode() method, HashSets are quite efficient, so the performance of this will not be too bad.
Untested:
Set<PVector> uniques = new HashSet<>();
Set<PVector> duplicates = new HashSet<>();
for (PVector p : listToCull) {
if (!duplicates.contains(p)) {
if (uniques.contains(p)) {
uniques.remove(p);
duplicates.add(p);
}
else {
uniques.add(p);
}
}
}
Alternatively, you may use a third-party library which offers a Bag or MultiSet. This allows you to count how many occurrences of each element are in the collection, and then at the end discard all elements where the count is different than 1.

What you are looking for is the intersection:
Assuming that PVector (terrible name by the way) implements hashCode() and equals() correctly a Set will eliminate duplicates.
If you want a intersection of the List and an existing Set create a Set from the List then use Sets.intersection() from Guava to get the ones common to both sets.
public static <E> Sets.SetView<E> intersection(Set<E> set1, Set<?> set2)
Returns an unmodifiable view of the intersection of two sets. The returned set contains all
elements that are contained by both backing sets. The iteration order
of the returned set matches that of set1. Results are undefined if
set1 and set2 are sets based on different equivalence relations (as
HashSet, TreeSet, and the keySet of an IdentityHashMap all are).
Note: The returned view performs slightly better when set1 is the
smaller of the two sets. If you have reason to believe one of your
sets will generally be smaller than the other, pass it first.
Unfortunately, since this method sets the generic type of the returned
set based on the type of the first set passed, this could in rare
cases force you to make a cast, for example:
Set aFewBadObjects = ... Set manyBadStrings =
...
// impossible for a non-String to be in the intersection
SuppressWarnings("unchecked") Set badStrings = (Set)
Sets.intersection(
aFewBadObjects, manyBadStrings); This is unfortunate, but should come up only very rarely.
You can also do union, complement, difference and cartesianProduct as well as filtering very easily.

So you want pointSet to hold the items in listToCull which have no duplicates? Is that right?
I would be inclined to create a Map, then iterate twice over the list, the first time putting a value of zero in for each PVector, the second time adding one to the value for each PVector, so at the end you have a map with counts. Now you're interested in the keys of the map for which the value is exactly equal to one.
It's not perfectly efficient - you're operating on list items more times than absolutely necessary - but it's quite clean and simple.

OK, here's the solution I've come up with, I'm sure there are better ones out there but this one's working for me. Thanks to all who gave direction!
To get unique items, you can run a Set, where listToCull is a list of PVectors with duplicates:
List<PVector> culledList = new ArrayList<PVector>();
Set<PVector> pointSet = new LinkedHashSet<PVector>(listToCull);
culledList.addAll(pointSet);
To go further, suppose you want a list where you've removed all items in listToCull which have a duplicate. You can iterate through the list and test whether it's in the set for each item. This let's us do one loop, rather than a nested loop:
Set<PVector> pointSet = new HashSet<PVector>(listToCull);
Set<PVector> removalList = new HashSet<PVector>();//list to remove
for (PVector pt : listToCull) {
if (pointSet.contains(pt)) {
removalList.add(pt);
}
else{
pointSet.add(pt);
}
}
pointSet.removeAll(removalList);
List<PVector> onlyUniquePts = new ArrayList<PVector>();
onlyUniquePts.addAll(pointSet);

Related

How to get first or last item from cqengine IndexedCollection with NavigableIndex

I have com.googlecode.cqengine.IndexedCollection object with NavigableIndex configured. I need to get first or last item from the index or iterator of the index in general.
I suppose this should be trivial. I know I can create Query object with queryOptions object, use it to retrieve iterator from IndexedCollection and get first object, but I'm not sure if it's optimal for performance. Surely it's not elegant.

With help of miradham I figured out that I need to remember indexes, since it's hard to pick up the right one if we have more of them. It will only work with NavigableIndex, we can't iterate base class Index
collection = new ConcurrentIndexedCollection<Data>();
index = NavigableIndex.onAttribute(Data.UNIQUE_TIMESTAMP);
collection.addIndex(index);
when I have the index:
try (CloseableIterator<KeyValue<String, Data>> iterator = indexUniqueTimestamp.getKeysAndValuesDescending(null).iterator()) {
if (iterator.hasNext())
return iterator.next().getValue();
}
return null;

One trick to retrieve the min or max (i.e first or last) object according on one of its attributes, is to use an all() query (which matches all objects in the collection), and to request that results should be returned in ascending or descending order of your attribute.
For example, if you had a collection of Car objects, you could use the following code to retrieve the car which has the highest (i.e. the max) price:
try (ResultSet<Car> results = cars.retrieve(
all(Car.class),
queryOptions(
orderBy(descending(Car.PRICE)),
applyThresholds(
threshold(INDEX_ORDERING_SELECTIVITY, 1.0)
)
))) {
results.stream()
.limit(1)
.forEach(System.out::println);
}
You can also change the limit to something other than 1, in case you want the top n most expensive cars to be returned.
The code above will work regardless of whether or not you actually have a NavigableIndex on the price. The bit about INDEX_ORDERING_SELECTIVITY is to actually request CQEngine to leverage the index (more details here).

or iterator of the index in general
You can use getIndexes() API of QueryEngine interface to retrieve set of Indexes.
Example code:
IndexedCollection<Car> indexedCollection = new ConcurrentIndexedCollection<Car>();
indexedCollection.addIndex(HashIndex.onAttribute(Car.CAR_ID), noQueryOptions());
List<Index<Car>> indexes = new ArrayList<Index<Car>>();
for (Index<Car> index : indexedCollection.getIndexes()) {
indexes.add(index);
}

NavigableIndex stores object in element in Map with attribute as key and set of object as value.
NavigableIndex does not maintain insertion order. First element of the index could be anything.
CQEngine is best designed for random access of object in collection not sequential.
Normal collections in java is best suited for sequence access with index.
one elegant way of accessing first element is to create SequentialIndex class and add it to concurrent collection. retrieve element using index as query.

Is there a LinkedHashSet / Map equivalent, with order preserved for inserted duplicates?

I was wondering two things.
1) What does a HashSet do with an added duplicate value? I believe it replaces the value?
If that is the case, what about for a LinkedHashSet? I'm pretty sure it doesn't change the order, so does it still replace the value? (Why would it?)
2) What if I wanted to use an ordered collection that doesn't allow duplicates, but does replace an existing value with it's duplicate, thus re-ordering the position?
Ie. It would be like a LinkedHashSet, except duplicate values added would be replaced and their positions updated.
Is there a collection that might do this? Or will I have to write my own? I don't want to have to write my own!

1) Adding a duplicate into a Set does not do anything (it returns false immediately, and contents of the Set are not affected), regardless of particular implementation, be it a HashSet, a TreeSet or a LinkedHashSet.
2) Check out LinkedHashMap, it is, probably, the closest to what you want. It has a boolean argument to the constructor, that lets you specify whether you want it to use "insertion-order" (false) or "access-order". The former is the same as LinkedHashSet, the latter will "bump" the key up if you re-insert it, and also if you look it up:
Map<String, Integer> map = new LinkedHashMap(10, 0.75, true);
map.put("foo", 1); map.put("bar", 2);
System.out.println(map.keySet().iterator().next()); // prints "bar"
map.put("foo", 1);
System.out.println(map.keySet().iterator().next()); // prints "foo"
map.get("bar");
System.out.println(map.keySet().iterator().next()); // prints "bar"

How To Prevent Duplicates In Object ArrayList

I have a custom object array list, the object must be in an array list however i have some duplicates in the list and i want to preform a check before i do an add to the list. How can this be achieved. The victimSocialSecurityNumber is unique. Under is my code:
CODE
while (rs.next()){
Citizens victims = new Citizens();
victims.setSocialSecurityNumber(rs.getInt("victimSocialSecurityNumber"));
victims.setfName(rs.getString("victimFName"));
victims.setlName(rs.getString("victimLName"));
victims.setPhoto(rs.getString("victimPhoto"));
victims.setName(rs.getString("victimFName") +" "+ rs.getString("victimLName"));
crime.getVictims().add(victims);

you can convert arraylist to set and back to get rid of the duplicates or use directly structure which allows only sorted unique elements : LinkedHashSet

Assuming Citizens overrides equals, you can do it like this
if (!crime.getVictims().contains(victims)) {
crime.getVictims().add(victims);
}
though generally when duplicates are not allowed the solution is Set
If you have doubts how to override equals / hashCode read http://javarevisited.blogspot.com/2011/10/override-hashcode-in-java-example.html

You can use a hash set to add the objects and convert it to an Arraylist. This can help you to check whether the victim is unique.
CODE
Set hashset = new HashSet();
while (rs.next()){
Citizens victims = new Citizens();
victims.setSocialSecurityNumber(rs.getInt("victimSocialSecurityNumber"));
victims.setfName(rs.getString("victimFName"));
victims.setlName(rs.getString("victimLName"));
victims.setPhoto(rs.getString("victimPhoto"));
victims.setName(rs.getString("victimFName") +" "+ rs.getString("victimLName"));
hashset.add(victims);
}
List list = new ArrayList(hashset);

I could be completely wrong here, but wouldn't a for loop solve your problem? You could just compare what you are about to add to all the elements in the arraylist, and if there are no matches add it, and if there is don't?

Adding elements into ArrayList at position larger than the current size

Currently I'm using an ArrayList to store a list of elements, whereby I will need to insert new elements at specific positions. There is a need for me to enter elements at a position larger than the current size. For e.g:
ArrayList<String> arr = new ArrayList<String>();
arr.add(3,"hi");
Now I already know there will be an OutOfBoundsException. Is there another way or another object where I can do this while still keeping the order? This is because I have methods that finds elements based on their index. For e.g.:
ArrayList<String> arr = new ArrayList<String>();
arr.add("hi");
arr.add(0,"hello");
I would expect to find "hi" at index 1 instead of index 0 now.
So in summary, short of manually inserting null into the elements in-between, is there any way to satisfy these two requirements:
Insert elements into position larger than current size
Push existing elements to the right when I insert elements in the middle of the list
I've looked at Java ArrayList add item outside current size, as well as HashMap, but HashMap doesn't satisfy my second criteria. Any help would be greatly appreciated.
P.S. Performance is not really an issue right now.
UPDATE: There have been some questions on why I have these particular requirements, it is because I'm working on operational transformation, where I'm inserting a set of operations into, say, my list (a math formula). Each operation contains a string. As I insert/delete strings into my list, I will dynamically update the unapplied operations (if necessary) through the tracking of each operation that has already been applied. My current solution now is to use a subclass of ArrayList and override some of the methods. I would certainly like to know if there is a more elegant way of doing so though.

Your requirements are contradictory:
... I will need to insert new elements at specific positions.
There is a need for me to enter elements at a position larger than the current size.
These imply that positions are stable; i.e. that an element at a given position remains at that position.
I would expect to find "hi" at index 1 instead of index 0 now.
This states that positions are not stable under some circumstances.
You really need to make up your mind which alternative you need.
If you must have stable positions, use a TreeMap or HashMap. (A TreeMap allows you to iterate the keys in order, but at the cost of more expensive insertion and lookup ... for a large collection.) If necessary, use a "position" key type that allows you to "always" generate a new key that goes between any existing pair of keys.
If you don't have to have stable positions, use an ArrayList, and deal with the case where you have to insert beyond the end position using append.
I fail to see how it is sensible for positions to be stable if you insert beyond the end, and allow instability if you insert in the middle. (Besides, the latter is going to make the former unstable eventually ...)

even you can use TreeMap for maintaining order of keys.

First and foremost, I would say use Map instead of List. I guess your problem can be solved in better way if you use Map. But in any case if you really want to do this with Arraylist
ArrayList<String> a = new ArrayList<String>(); //Create empty list
a.addAll(Arrays.asList( new String[100])); // add n number of strings, actually null . here n is 100, but you will have to decide the ideal value of this, depending upon your requirement.
a.add(7,"hello");
a.add(2,"hi");
a.add(1,"hi2");

Use Vector class to solve this issue.
Vector vector = new Vector();
vector.setSize(100);
vector.set(98, "a");
When "setSize" is set to 100 then all 100 elements gets initialized with null values.

For those who are still dealing with this, you may do it like this.
Object[] array= new Object[10];
array[0]="1";
array[3]= "3";
array[2]="2";
array[7]="7";
List<Object> list= Arrays.asList(array);
But the thing is you need to identify the total size first, this should be just a comment but I do not have much reputation to do that.

Question regarding Java's LinkedList class

I have a question regarding the LinkedList class in Java.
I have a scenario wherein i need to add or set an index based on whether the index exists in the linkedlist or not. A pseudo-code of what i want to achieve is --
if index a exists within the linkedlist ll
ll.set(a,"arbit")
else
ll.add(a,"arbit")
I did go through the Javadocs for the LinkedList class but did not come across anything relevant.
Any ideas ?
Thanks
p1ng

What about using a Map for this:
Map<Integer, String> map = new HashMap<Integer, String>();
// ...
int a = 5;
map.put(a, "arbit");
Even if a already exists, put will just replace the old String.

Searching in linked list is not very efficient (O(n)). Have you considering using different data structure - e.g. HashMap which would give you O(1) access time?

If you need sequential access as well as keyed access you might want to try a LinkedHashMap, available as from 1.4.2
http://download.oracle.com/javase/1.4.2/docs/api/java/util/LinkedHashMap.html

Map<Integer, String> is definitely a good (the best?) way to go here.
Here's an option for keeping with LinkedList if that's for some bizarre reason a requirement. It has horrible runtime performance and disallows null, since null now becomes an indicator that an index isn't occupied.
String toInsert = "arbit";
int a = 5;
//grow the list to allow index a
while ( a >= ll.size() ) {
ll.add(null);
}
//set index a to the new value
ll.set(a, toInsert);
If you're going to take this gross road, you might be better off with an ArrayList.
Why is it so bad? Say you had only one element at index 100,000. This implementation would require 100,000 entries in the list pointing to null. This results in horrible runtime performance and memory usage.

LinkedList cannot have holes inside, so you can't have list [1,2,3,4] and then ll.add(10,10), so I think there's something wrong with your example. Use either Map or search for some other sparse array

It looks like you're trying to use a as a key, and don't state whether you have items at index i < a. If you run your code when ll.size() <= a then you'll end up with a NullPointerException.
And if you add an item at index a the previous item at a will now be at a+1.
In this case it would be best to remove item at a first (if it exists) then add item "arbit" into a. Of course, the condition above re: ll.size() <=a still applies here.
If the order of the results is important, a different approach could use a HashMap<Integer,String> to create your dataset, then extract the keys using HashMap<?,?>.getKeySet() then sort them in their natural order (they're numeric after all) then extract the values from the map while iterating over the keySet. Nasty, but does what you want... Or create your own OrderedMap class, that does the same...
Could you expand on why you need to use a LinkedList? Is ordering of the results important?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Cull all duplicates in a set - java

Related

How to get first or last item from cqengine IndexedCollection with NavigableIndex

Is there a LinkedHashSet / Map equivalent, with order preserved for inserted duplicates?

How To Prevent Duplicates In Object ArrayList

Adding elements into ArrayList at position larger than the current size

Question regarding Java's LinkedList class

Categories

Resources