Removing duplicates while keeping parallel lists in sync

Removing duplicates while keeping parallel lists in sync - java

This problem drives me crazy.I have vectorA(float),vectorB(string1),vectorC(string2) which are parallel and i want to eliminate the duplicates in vectorA ,while i manage to retain the
parallelity among the vectors.
Any ideas?

Here's a single-pass, in-place algorithm:
Set<Float> seen = new HashSet<Float>();
int uniques = 0;
for (int i = 0; i < n; i++) {
if (seen.add(vectorA[i])) {
vectorA[uniques] = vectorA[i];
vectorB[uniques] = vectorB[i];
vectorC[uniques] = vectorC[i];
uniques++;
}
}
and then after you're done, ignore all elements after position uniques (or copy them all into new arrays).

Create a set<float> for items that you have seen, scan through vectorA recording duplicate indexes, then delete indexes that you marked as duplicates while going back starting at the end of the vectors.
Set<Float> seen = new HashSet<Float>();
List<Integer> del = new List<Integer>();
for (int i = 0 ; i != vectorA.size() ; i++) {
if (seen.add(vectorA[i])) {
del.add(i);
}
}
for (int i = del.size()-1 ; i >= 0 ; i--) {
vectorA.remove(del[i]);
vectorB.remove(del[i]);
vectorC.remove(del[i]);
}
Going back is important, because otherwise your indexes will get out of sync.

Create a class that combines the three values and overrides equals and hashCode. Add these instances to a single list instead of three parallel lists. Once you're ready to remove duplicates (assuming you need to keep them around first and remove them at a later point), add them to a LinkedHashSet and back to an ArrayList. LinkedHashSet will maintain insertion order (if that's not important use a standard HashSet) while removing duplicates.
class Triple {
float num;
String a;
String b;
public boolean equals(Object o) {
if (o == null || !(o instanceof Triple))
return false;
return num == ((Triple)o).num; // strict equality
}
public int hashCode() {
return Float.floatToRawIntBits(num);
}
}
List<Triple> removeDuplicates(List<Triple> items) {
return new ArrayList<Triple>(new LinkedHashSet<Triple>(items));
}

Related

How would I check if all elements in an array are present in another array with no duplicates?

Run into abit of trouple with my code, want to check if sharedMemory is present in entire deck(it is) and have no duplicates of an object. I am getting false with this code and I don't really know why, any help would be appreciated. rank and suit together form a card object. rank and card are enumerators with values. Note: when i use 'return Arrays.asList(entireDeck).containsAll(sharedMemory)' on its own it does show true, but the messy if part is trying to check for duplicates
public static boolean isFull(){
Card[] entireDeck = Deck.fillDeck();
sharedMemory =Arrays.asList(Deck.fillDeck());
int i=0,duplicates=0, position =0, original;
for(Card c:entireDeck){
Card f = new Card(c.rank, c.suit);
original=i;
if (f.rank.equals(sharedMemory.get(i).rank)&&f.suit.equals(sharedMemory.get(i).suit)){
duplicates+=1;
position=i;
for(i=0; i<position;i++) {
if (f.rank.equals(sharedMemory.get(i).rank)&&f.suit.equals(sharedMemory.get(i).suit))
{
duplicates += 1;
return false;
}
}
for (i=position+1; i<52; i++){
if (f.rank.equals(sharedMemory.get(i).rank)&&f.suit.equals(sharedMemory.get(i).suit))
{
duplicates += 1;
return false;
}
}
if(duplicates>1){
return false;
}
else{
i=original;
}
}
i=original;
i++;
}
return Arrays.asList(entireDeck).containsAll(sharedMemory);
}
Thanks :)

The easiest way is to use a Set<Card>. If the return value of adding an object to a set is false, then it means the set already contains it, thus a duplicate. For this to work the Card class must override both hashCode and equals.
Set<Card> set = new HashSet<>();
for(Card card : cardArray) {
if (!set.add(card)) { // if false then !false is true so signal duplicate.
System.out.println("Duplicate of " + card + " found);
break;
}
}
You can always sort the Cards and then do a one to one comparison to see if they are equal. Comparing adjacent cards of a sorted deck can also detect duplicates.

Here is one example of how this could be achieved using an integer array. For your use case you would want to implement a Comparator for sorting and ordering.
I decided to show an approach which modify the input arrays during the check (by sorting them each run). This allows for moderate performance improvements, however I doubt that your use case will be using a large enough dataset for it to be a consideration.
// Checks if a is a subset of b and does not contain duplicates
public static boolean subsetOf(int[] a, int[] b) {
Arrays.sort(a);
Arrays.sort(b);
int aIdx = 0;
int bIdx = 0;
while (aIdx < a.length && bIdx < b.length) {
if (a[aIdx] == b[bIdx]) {
aIdx++;
bIdx++;
// Check for duplicate when incrementing index
if (aIdx + 1 < a.length && a[aIdx] == a[aIdx + 1]) {
return false;
}
} else if (b[bIdx] < a[aIdx]) {
// We need to keep moving through b to find next a
bIdx++;
} else {
// We missed an element in a
return false;
}
}
// Verify that we found all elements in a
return aIdx == a.length;
}
For your use case, I would recommend something like this for readability.
import java.util.Arrays;
import java.util.Set;
import java.util.HashSet;
public static boolean subsetOf(Card[] a, Card[] b) {
Set<Card> aSet = new Hashset<>();
Set<Card> bSet = new Hashset<>();
aSet.addAll(Arrays.asList(a));
bSet.addAll(Arrays.asList(b));
return aSet.size() == a.length && bSet.containsAll(aSet);
}
In order to use this method, make sure to implement hashCode and equals in Card. You don't need to do anything fancy or import any extra libraries. The important part is that it returns a different number for every rank/suit combination.
public class Card {
// Gives a unique number for each card
#Override
public int hashCode() {
return 4 * rank + suitNum;
}
// Check if this card is the same as another object
#Override
public boolean equals(Object other) {
if (other instanceof Card) {
Card otherCard = (Card) other;
return rank == otherCard.rank && suitNum == otherCard.suitNum;
}
return false;
}
}

How to add an integer to a set while iterating?

I have a set of sets of integers: Set<Set<Integer>>.
I need to add integers to the set of sets as if it were a double array. So add(2,3) would have to add integer 3 to the 2nd set.
I know a set is not very suitable for this operation but it's not my call.
The commented line below clearly does not work but it shows the intention.
My question is how to add an integer to a set while iterating?
If it's necessary to identify each set, how would one do this?
#Override
public void add(int a, int b) {
if (!isValidPair(a, b)) {
throw new IllegalStateException("!isValidPair does not hold for (a,b)");
}
Iterator<Set<Integer>> it = relation.iterator();
int i = 0;
while (it.hasNext() && i <= a) {
//it.next().add(b);
i++;
}
}

One fundamental things you should be aware of, for which makes all existing answer in this question not working:
Once an object is added in a Set (similarly, as key in Map), it is not supposed to change (at least not in aspects that will change its equals() and hashCode()). The "Uniqueness" checking is done only when you add the object into the Set.
For example
Set<Set<Integer>> bigSet = new HashSet<>();
Set<Integer> v1 = new HashSet<>(Arrays.asList(1,2));
bigSet.add(v1);
System.out.println("contains " + bigSet.contains(new HashSet<>(Arrays.asList(1,2)))); // True
v1.add(3);
System.out.println("contains " + bigSet.contains(new HashSet<>(Arrays.asList(1,2)))); // False!!
System.out.println("contains " + bigSet.contains(new HashSet<>(Arrays.asList(1,2,3)))); // False!!
You can see how the set is corrupted. It contains a [1,2,3] but contains() does not work, neither for [1,2] nor [1,2,3].
Another fundamental thing is, your so-called '2nd set' may not make sense. Set implementation like HashSet maintain the values in arbitrary order.
So, with these in mind, what you may do is:
First find the n-th value, and remove it
add the value into the removed value set
re-add the value set.
Something like this (pseudo code again):
int i = 0;
Set<Integer> setToAdd = null;
for (Iterator itr = bigSet.iterator; itr.hasNext(); ++i) {
Set<Integer> s = itr.next();
if (i == inputIndex) {
// remove the set first
itr.remove();
setToAdd = s;
break;
}
}
// update the set and re-add it back
if (setToAdd != null) {
setToAdd.add(inputNumber);
bigSet.add(setToAdd);
}

Use a for-each loop and make your life easier.
public boolean add(int index, int value) {
// because a and b suck as variable names
if (index < 0 || index >= values.size()) {
return false;
}
int iter = 0;
for (Set<Integer> values : relation) {
if (iter++ == index) {
return values.add(value):
}
}
return false;
}
Now all you have to figure out is what to do if relation is unordered, as a Set or a relation are, because in that case a different Set<Integer> could match the same index each time the loop executes.

Use can use Iterators of Guava library like this :
#Override
public void add(int a, int b) {
if (!isValidPair(a, b)) {
throw new IllegalStateException("!isValidPair does not hold for (a,b)");
}
Iterators.get(relation.iterator(), a).add(b);
}
Edit : without Guava:
Iterator<Set<Integer>> iterator = relation.iterator();
for(int i = 0; i < a && iterator.hasNext(); ++i) {
iterator.next();
}
if(iterator.hasNext()) {
iterator.next().add(b);
}

creating java generic data structure

I am building a data structure to learn more about java. I understand this program might be useless.
Here's what I want. I want to create a data structure that store smallest 3 values. if value is high, then ignore it. When storing values than I also want to put them in correct place so I don't have to sort them later. I can enter values by calling the add method.
so let's say I want to add 20, 10, 40, 30 than the result will be [10,20,30]. note I can only hold 3 smallest values and it store them as I place them.
I also understand that there are a lot of better ways for doing this but again this is just for learning purposes.
Question: I need help creating add method. I wrote some code but I am getting stuck with add method. Please help.
My Thinking: we might have to use a Iterator in add method?
public class MyJavaApp {
public static void main(String[] args){
MyClass<Integer> m = new MyClass<Integer>(3);
m.add(10);
m.add(20);
m.add(30);
m.add(40);
}
}
public class MyClass<V extends Comparable<V>> {
private V v[];
public MyClass(int s){
this.v = (V[])new Object[s];
}
public void add(V a){
}
}

Here is a rough sketch of the add method you have to implement.
You have to use the appropriate implementation of the compareTo method when comparing elements.
public void add(V a){
V temp = null;
if(a.compareTo( v[0]) == -1 ){
/*
keeping the v[0] in a temp variable since, v[0] could be the second
smallest value or the third smallest value.
Therefore call add method again to assign it to the correct
position.
*/
temp = v[0];
v[0] = a;
add(temp);
}else if(a.compareTo(v[0]) == 1 && a.compareTo(v[1]) == -1){
temp = v[1];
v[1] = a;
add(temp);
}else if(a.compareTo(v[1]) == 1 && a.compareTo(v[2]) == -1){
temp = v[2];
v[2] = a;
add(temp);
}
}
Therefore the v array will contain the lowerest elements.
Hope this helps.

A naive, inefficient approach would be (as you suggest) to iterate through the values and add / remove based on what you find:
public void add(Integer a)
{
// If fewer than 3 elements in the list, add and we're done.
if (m.size() < 3)
{
m.add(a);
return;
}
// If there's 3 elements, find the maximum.
int max = Integer.MIN_VALUE;
int index = -1;
for (int i=0; i<3; i++) {
int v = m.get(i);
if (v > max) {
max = v;
index = i;
}
}
// If a is less than the max, we need to add it and remove the existing max.
if (a < max) {
m.remove(index);
m.add(a);
}
}
Note: this has been written for Integer, not a generic type V. You'll need to generalise. It also doesn't keep the list sorted - another of your requirements.

Here's an implementation of that algorithm. It consists of looking for the right place to insert. Then it can be optimized for your requirements:
Don't bother looking past the size you want
Don't add more items than necessary
Here's the code. I added the toString() method for convenience. Only the add() method is interesting. Also this implementation is a bit more flexible as it respects the size you give to the constructor and doesn't assume 3.
I used a List rather than an array because it makes dealing with generics a lot easier. You'll find that using an array of generics makes using your class a bit more ugly (i.e. you have to deal with type erasure by providing a Class<V>).
import java.util.*;
public class MyClass<V extends Comparable<V>> {
private int s;
private List<V> v;
public MyClass(int s) {
this.s = s;
this.v = new ArrayList<V>(s);
}
public void add(V a) {
int i=0;
int l = v.size();
// Find the right index
while(i<l && v.get(i).compareTo(a) < 0) i++;
if(i<s) {
v.add(i, a);
// Truncate the list to make sure we don't store more values than needed
if(v.size() > s) v.remove(v.size()-1);
}
}
public String toString() {
StringBuilder result = new StringBuilder();
for(V item : v) {
result.append(item).append(',');
}
return result.toString();
}
}

Removing a redundant value in an array

I am not sure why my removeDuplicates method refuses to actually get rid of non-unique values. I am not sure if the problem is with the size incrementation or my method call.
// post: places the value in the correct place based on ascending order
public void add(int value) {
size++;
if (size == 1) {
elementData[0] = value;
} else {
int position = Arrays.binarySearch(elementData, 0, size - 1, value);
if (position < 0 ) {
position = (-position) - 1;
}
for (int i = size - 1; i > position; i--) {
elementData[i] = elementData[i - 1];
}
elementData[position] = value;
}
if (unique) {
removeDuplicates();
}
}
//post: removes any duplicate values from the list
private void removeDuplicates() {
for(int i = size - 1; i > 0; i--) {
if (elementData[i] == elementData[i - 1]){
remove(i - 1);
}
}
}

#user98643 -
Jano's suggestion is spot-on correct: the best solution is to simply use the appropriate data structure, for example a TreeSet.
SUGGESTIONS:
1) In general, always consider using a container such a "List<>" in preference to an array
2) In general, look for the container that already has most of the properties you need
3) In this case, A) you want all elements sorted, and B) each element must be unique.
A TreeSet fits the bill beautifully.
IMHO..
http://docs.oracle.com/javase/7/docs/api/java/util/TreeSet.html
http://math.hws.edu/javanotes/c10/s2.html
http://www.mkyong.com/java/what-is-the-different-between-set-and-list/

Try this..
// Convert it to list as we need the list object to create a
// set object. A set is a collection object that cannot have
// a duplicate values, so by converting the array to a set
// the duplicate value will be removed.
List<String> list = Arrays.asList(data);
Set<String> set = new HashSet<String>(list);
System.out.print("Remove duplicate result: ");
//
// Create an array to convert the Set back to array.
// The Set.toArray() method copy the value in the set to the
// defined array.
//
String[] result = new String[set.size()];
set.toArray(result);
for (String s : result) {
System.out.print(s + ", ");

Compare one element from multiple 2D arrays (java)

I have multiple 2D arrays of Strings that are layed out a little like this
Array 1
[0] = ["01/01/01","Bill","17","0.86"]
[1] = ["02/01/01","Bill","12","0.84"]
[2] = ["03/01/01","Bill","15","0.85"]
Array 2
[0] = ["01/01/01","Joe","14","0.81"]
[1] = ["02/01/01","Joe","15","0.83"]
[2] = ["04/01/01","Joe","19","0.85"]
I'm trying to compare only data from the same days, so what I need to do is search both arrays for dates that are in one but not the other and then remove them. So in the above example I would remove [2] from both of the arrays. Is there a way of doing this using List/Collection retainAll or will I have to write a loop? Oh I'm using Java.

There is no direct way of removing items using collection. But if both the arrays are sorted by date, you would be able to compare the data without removing the missing dates from each of the array.

Well, I don't prefer using arrays for this problem. Removing elements from array is a bad idea. You might try linked list. Something like this
for (int i=0; i < array1List.size(); i++) {
String date = array1List.get(i)[0];
int index = -1;
for(int j=0; j < array2List.size(); j++) {
if array2List.get(j)[0].equals(date)) {
index = j;
break;
}
if(index >= 0) array2List.remove(j);
}
}

To use a Collection to do this you will have to put each array entry into an object. Something like:
class DayInfo {
String date;
String name;
...
public DayInfo(String[] arrayData) {
this.date = arrayData[0];
this.name = arrayData[1];
...
}
#Override
public boolean equals(Object obj) {
if (!(obj instanceof DayInfo))
return false;
if (date == null) {
return ((DayInfo)obj).date == null;
} else {
return date.equals((DayInfo)obj).date);
}
}
#Override
public int hashCode() {
if (date == null)
return 0;
else
return date.hashCode();
}
}
Then if you load both of your arrays into DateInfo collections:
Set<DayInfo> dayInfos1 = new HashSet<DayInfo>(array1.length);
for (String[] arrayEntry : array1)
dayInfos1.add(new DayInfo(arrayEntry));
Set<DayInfo> dayInfos2 = new HashSet<DayInfo>(array2.length);
for (String[] arrayEntry : array2)
dayInfos2.add(new DayInfo(arrayEntry));
Now you can use the retainAll in both directions:
// remove everything from set #1 that doesn't have a date in set #2
dayInfos1.retainAll(dayInfos2);
// remove everything from set #2 that doesn't have a date in set #1
dayInfos2.retainAll(dayInfos1);
I think that would work.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Removing duplicates while keeping parallel lists in sync - java

This problem drives me crazy.I have vectorA(float),vectorB(string1),vectorC(string2) which are parallel and i want to eliminate the duplicates in vectorA ,while i manage to retain the parallelity among the vectors. Any ideas?

Related

How would I check if all elements in an array are present in another array with no duplicates?

How to add an integer to a set while iterating?

creating java generic data structure

Removing a redundant value in an array

Compare one element from multiple 2D arrays (java)

Categories

Resources