How to implement a key-value pair with variability in the key - java

I'm writing some code to de-duplicate data based on 2 fields:
A string of characters, we'll call this the UMI
An array of integers
I've created a POJO to hold this data and work as key for a TreeMap. The full set of data is held in the value - this way I only keep relevant data in memory.
However, the next requirement is to have variability in the UMI AND the integers. For example, the following two pieces of data would be considered duplicates based on the UMI having a variability(mismatch) of 1.
a. "AAA", [200,300]
b. "ABA", [200,300]
Similarly, the following would be considered duplicates based on the integer array, given a mismatch allowance of 2.
a. "AAA", [201,300]
b. "AAA", [203,300]
My current attempt has been to make this POJO implement the Comparable interface, and attempt to work the compareTo method to take into account the variability:
public class UMIPrimoKey implements Comparable<UMIPrimoKey> {
private final String UMI;
private final int[] ints;
private final int umiMisMatch;
private final int posMisMatch;
public UMIPrimoKey(String UMI, int[] ints, int umiMisMatch, int posMisMatch) {
this.UMI = UMI;
this.ints = ints;
this.umiMisMatch = umiMisMatch;
this.posMisMatch = posMisMatch;
}
#Override
public int compareTo(UMIPrimoKey o) {
if (!Arrays.equals(ints, o.ints)) {
if (ints.length == o.ints.length) {
for (int i = 0; i < ints.length; i++) {
if (Math.abs(ints[i] - o.ints[i]) > posMisMatch) {
return -1;
}
}
} else {
return -1;
}
}
if (XsamStringUtils.numberOfDifferences(UMI, o.UMI) <= umiMisMatch) {
return 0;
}
return 1;
}
}
XsamStringUtils.numberOfDifferences is just a simple static method to count the number of differences between the two UMIs.
I return -1 if any two integers from the array have a difference greater than the allowed mismatches (posMisMatch). 0 is returned if the integers are allowed, and the number of mismatches in the UMI is less than the allowed amount, specified by umiMisMatch.
Otherwise, 1 is returned as the UMIs don't match.
I've then used this in a TreeMap which takes into account the compareTo method.
This works in my unit tests, with small numbers of UMIPrimoKeys added to it, but I'm getting some strange results when running the completed program. It's probably due to the rules for the method outlined here: https://docs.oracle.com/javase/8/docs/api/java/lang/Comparable.html but i'm finding it hard to adapt the code to take the rules into account.
Any direction is appreciated, thanks for reading!

According to the docs of compareTo:
The implementor must ensure sgn(x.compareTo(y)) == -sgn(y.compareTo(x)) for all x and y. (This implies that x.compareTo(y) must throw an exception iff y.compareTo(x) throws an exception.)
The implementor must also ensure that the relation is transitive: (x.compareTo(y)>0 && y.compareTo(z)>0) implies x.compareTo(z)>0.
Finally, the implementor must ensure that x.compareTo(y)==0 implies that sgn(x.compareTo(z)) == sgn(y.compareTo(z)), for all z.
I think that's not true to your code, and that could cause problems with the get function not finding your entry

Related

Custom Java Comparator With Pre Defined Top Result

I want to sort a list of field names alphabetically however I need to include a condition in the doCompare method of the comparator so that if the field name is "pk" that will always be sorted to the top of the list. What I have is below but I'm not sure if I'm taking the right approach, particualrly with the reurn value of -1000. Any advice on this would be much appreciated.
#Override
public int doCompare(Object firstRec, Object secondRec)
{
MyField firstField = (MyField) firstRec;
MyField secondField = (MyField ) secondRec;
if(firstField.name() == "pk")
{
return -1000;
}
return StringUtils.compareStrings(firstField.name().toLowerCase(), secondField.name().toLowerCase());
}
The requirements of a Comparator (and, by extension, methods which are supposed to act like Comparator.compare) are described in the Javadoc:
The implementor must ensure that sgn(compare(x, y)) == -sgn(compare(y, x)) for all x and y. (This implies that compare(x, y) must throw an exception if and only if compare(y, x) throws an exception.)
The implementor must also ensure that the relation is transitive: ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0.
Finally, the implementor must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z.
Assuming StringUtils.compareStrings correctly implements these requirements, the thing you've got wrong is the first requirement: you also need to consider the cases when secondField is also pk:
The general pattern for writing correct Comparators is:
int firstComparison = /* compare something about firstField and secondField */;
if (firstComparison != 0) {
return firstComparison;
}
int secondComparison = /* compare something else about firstField and secondField */;
if (secondComparison != 0) {
return secondComparison;
}
// ...
return 0;
Applying that here:
int pkComparison = Boolean.compare(secondField.name().equals("pk"), firstField.name().equals("pk"));
if (pkComparison != 0) {
return pkComparison;
}
int compareStringsComparison = StringUtils.compareStrings(firstField.name().toLowerCase(), secondField.name().toLowerCase());
if (compareStringsComparison != 0) {
return compareStringsComparison;
}
return 0;
Obviously, the last if statement is redundant, because you always return compareStringsComparison whether or not it is zero; so you could write simply:
return StringUtils.compareStrings(firstField.name().toLowerCase(), secondField.name().toLowerCase());
I would recommend sticking to the compare/check and return/finally return 0 pattern, because it's easier to slot in additional conditions later. But it's not terrible either way.
The new static methods of class Comparator available since Java 8 are very handy to create a multi-criteria Comparator like in your case.
You could try something like this:
List<String> list = ... ;
list.sort(
Comparator.comparingBoolean("PK"::equals)
.thenComparing(StringUtils::compare)
);
You may need to use .reversed() in case the order is the opposite of what you want.
The great advantage of Comparator.comparing / Comparator.comparingXXX is that you don't need to twist your mind to get the correct behavior when to return a positive, negative or 0 value.
The Comparator.thenComparing dos proper chaining, i.e. it checks further criterias only when needed, only when previous comparisons returned 0.
If your list may contain null values, there are also methods to handle them properly. This isn't the case in this short example.

Implementing an equals() method to compare contents of two 'bag' objects

I am working on a school assignment. The objective is to practice GUI's, clone() methods, and using/ modifying existing code. I am trying to write an equals method in the way the instructor desires-- by using a clone of the object, removing items from the bag (returns boolean based on success or failure to remove).
The bag is represented in an array, and should return true in cases such as {1,2,3} and {3,2,1}, ie order does not matter, only the number of each number present in the arrays.
Here is the issue
It works in most cases, however there is a bug in cases where the bags contain numbers as such: {1,1,2} and {1,2,2} and other similar iterations. It is returning true instead of false.
I believe it has something to do with the remove() method we are supposed to use. If i understand it correctly, it is supposed to put the value at the 'end' of the array and decrease the manyItems counter (this is a variable for number of items in the array, because array.length is by default in the constructor 10.)
The code is largely written by another person. We had to import the existing files and write new methods to complete the task we were given. I have all the GUI part done so i will not include that class, only the used methods in the IntArrayBag class.
A second pair of eyes would be helpful. Thanks.
public class IntArrayBag implements Cloneable
{
// Invariant of the IntArrayBag class:
// 1. The number of elements in the bag is in the instance variable
// manyItems, which is no more than data.length.
// 2. For an empty bag, we do not care what is stored in any of data;
// for a non-empty bag, the elements in the bag are stored in data[0]
// through data[manyItems-1], and we don�t care what�s in the
// rest of data.
private int[ ] data;
private int manyItems;
public IntArrayBag( )
{
final int INITIAL_CAPACITY = 10;
manyItems = 0;
data = new int[INITIAL_CAPACITY];
}
public IntArrayBag clone( )
{ // Clone an IntArrayBag object.
IntArrayBag answer;
try
{
answer = (IntArrayBag) super.clone( );
}
catch (CloneNotSupportedException e)
{ // This exception should not occur. But if it does, it would probably
// indicate a programming error that made super.clone unavailable.
// The most common error would be forgetting the "Implements Cloneable"
// clause at the start of this class.
throw new RuntimeException
("This class does not implement Cloneable");
}
answer.data = data.clone( );
return answer;
}
public int size( )
{
return manyItems;
}
public boolean remove(int target)
{
int index; // The location of target in the data array.
// First, set index to the location of target in the data array,
// which could be as small as 0 or as large as manyItems-1; If target
// is not in the array, then index will be set equal to manyItems;
for (index = 0; (index < manyItems) && (target != data[index]); index++)
// No work is needed in the body of this for-loop.
;
if (index == manyItems)
// The target was not found, so nothing is removed.
return false;
else
{ // The target was found at data[index].
// So reduce manyItems by 1 and copy the last element onto data[index].
manyItems--;
data[index] = data[manyItems];
return true;
}
}
//I added extra variables that are not needed to try to increase readability,
//as well as when i was trying to debug the code originally
public boolean equals(Object obj){
if (obj instanceof IntArrayBag){
IntArrayBag canidate = (IntArrayBag) obj; // i know this can be changed, this was required
IntArrayBag canidateTest = (IntArrayBag) canidate.clone(); //this was created
//as a clone because it was otherwise referring to the same memory address
//this caused items to be removed from bags when testing for equality
IntArrayBag test = (IntArrayBag) this.clone();
//fast check to see if the two objects have the same number of items,
//if they dont will return false and skip the item by item checking
if (test.size() != canidateTest.size())
return false;
//the loop will go through every element in the test bag it will
//then remove the value that is present at the first index of the test bag
for (int i = 0; (i < (test.size()) || i < (canidateTest.size())); i++){
int check = test.data[i];
//remove() returns a boolean so if the value is not present in each bag
//then the conditional will be met and the method will return false
boolean test1 = test.remove(check);
boolean test2 = canidateTest.remove(check);
if (test1 != test2)
return false;
}//end for loop
// if the loop goes through every element
//and finds every value was true it will return true
return true;
}//end if
else
return false;
}//end equals
}
I cannot see the big picture, as I havent coded GUIs in Java before, however, as far as comparing 2 int[] arrays, I would sort the arrays before the comparison. This will allow you to eliminate problem cases like the one you stated ( if sorting is possible), then apply something like:
while(array_1[index]==array_2[index] && index<array_1.length)
{index++;}
and find where did the loop break by checking the final value of index
Is it explicitly stated to use clone? You can achieve it easily by overriding the hashCode() for this Object.
You can override the hashCode() for this object as follows:
#Override
public int hashCode() {
final int prime = 5;
int result = 1;
/* Sort Array */
Arrays.sort(this.data);
/* Calculate Hash */
for(int d : this.data) {
result = prime * result + d;
}
/* Return Result */
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj) return true;
if (obj == null || this.getClass() != obj.getClass()){
return false;
}
return false;
}
If you want to continue using your implementation for equals to compare test and CandidateTest then also you can compute unique hashes and make decision based on the results.
Here is the code snippet:
/* Assuming that you have put size comparison logic on top
and the two objects are of same size */
final int prime = 31;
int testResult = 1;
int candidateTestResult = 1;
for(int i = 0; i < test.size(); i++) {
testResult = prime * testResult + test.data[i];
candidateTestResult = prime * candidateTestResult + candidateTest.data[i];
}
/* Return Result */
return testResult == candidateTestResult;
I believe the problem is in this line:
for (int i = 0; (i < (test.size()) || i < (canidateTest.size())); i++){
The problem here is that test and canidateTest are the clones that you made, and you are removing elements from those bags. And any time you remove an element from the bag, the size will decrease (because you decrease manyItems, and size() returns manyItems). This means you're only going to go through half the array. Suppose the original size is 4. Then, the first time through the loop, i==0 and test.size()==4; the second time, i==0 and test.size()==3; the third time, i==2 and test.size()==2, and you exit the loop. So you don't look at all 4 elements--you only look at 2.
You'll need to decide: do you want to go through the elements of the original array, or the elements of the clone? If you go through the elements of the clone, you actually never need to increment i. You can always look at test.data[0], since once you look at it, you remove it, so you know test.data[0] will be replaced with something else. In fact, you don't need i at all. Just loop until the bag size is 0, or until you determine that the bags aren't equal. On the other hand, if you go through the elements of this.data (i.e. look at this.data[i] or just data[i]), then make sure i goes all the way up to this.size().
(One more small point: the correct spelling is "candidate".)
Maybe you should try SET interface
view this in detail :http://www.tutorialspoint.com/java/java_set_interface.htm
A set object cannot contains duplicate elements, so it's suitable for your assignment than build your own class.
For example:[1,1,2] and [1,2,2]
you can use this to test whether they are equal
arr1 = {1,1,2}
arr2 = {1,2,2}
Set<Integer> set = new HashSet<Integer>();
for(int i : arr1){//build set of arr1
if(set.contains(i)==false){
set.add(i)
}
}
for(int i:arr2){
if(set.contains(i)==false){
System.out.println('not equal');
break;
}
}
Hope this is helpful.

How to implement efficient hash cons with java HashSet

I am trying to implement a hash cons in java, comparable to what String.intern does for strings. I.e., I want a class to store all distinct values of a data type T in a set and provide an T intern(T t) method that checks whether t is already in the set. If so, the instance in the set is returned, otherwise t is added to the set and returned. The reason is that the resulting values can be compared using reference equality since two equal values returned from intern will for sure also be the same instance.
Of course, the most obvious candidate data structure for a hash cons is java.util.HashSet<T>. However, it seems that its interface is flawed and does not allow efficient insertion, because there is no method to retrieve an element that is already in the set or insert one if it is not in there.
An algorithm using HashSet would look like this:
class HashCons<T>{
HashSet<T> set = new HashSet<>();
public T intern(T t){
if(set.contains(t)) {
return ???; // <----- PROBLEM
} else {
set.add(t); // <--- Inefficient, second hash lookup
return t;
}
}
As you see, the problem is twofold:
This solution would be inefficient since I would access the hash table twice, once for contains and once for add. But okay, this may not be a too big performance hit since the correct bucket will be in the cache after the contains, so add will not trigger a cache miss and thus be quite fast.
I cannot retrieve an element already in the set (see line flagged PROBLEM). There is just no method to retrieve the element in the set. So it is just not possible to implement this.
Am I missing something here? Or is it really impossible to build a usual hash cons with java.util.HashSet?
I don't think it's possible using HashSet. You could use some kind of Map instead and use your value as key and as value. The java.util.concurrent.ConcurrentMap also happens to posess the quite convenient method
putIfAbsent(K key, V value)
that returns the value if it is already existent. However, I don't know about the performance of this method (compared to checking "manually" on non-concurrent implementations of Map).
Here is how you would do it using a HashMap:
class HashCons<T>{
Map<T,T> map = new HashMap<T,T>();
public T intern(T t){
if (!map.containsKey(t))
map.put(t,t);
return map.get(t);
}
}
I think the reason why it is not possible with HashSet is quite simple: To the set, if contains(t) is fulfilled, it means that the given t also equals one of the t' in the set. There is no reason for being able return it (as you already have it).
Well HashSet is implemented as HashMap wrapper in OpenJDK, so you won't win in memory usage comparing to solution suggested by aRestless.
10-min sketch
class HashCons<T> {
T[] table;
int size;
int sizeLimit;
HashCons(int expectedSize) {
init(Math.max(Integer.highestOneBit(expectedSize * 2) * 2, 16));
}
private void init(int capacity) {
table = (T[]) new Object[capacity];
size = 0;
sizeLimit = (int) (capacity * 2L / 3);
}
T cons(#Nonnull T key) {
int mask = table.length - 1;
int i = key.hashCode() & mask;
do {
if (table[i] == null) break;
if (key.equals(table[i])) return table[i];
i = (i + 1) & mask;
} while (true);
table[i] = key;
if (++size > sizeLimit) rehash();
return key;
}
private void rehash() {
T[] table = this.table;
if (table.length == (1 << 30))
throw new IllegalStateException("HashCons is full");
init(table.length << 1);
for (T key : table) {
if (key != null) cons(key);
}
}
}

Data structure to check for pairs?

Say I have objects A,B,C,D. They can contain references to one another, for example, A might reference B and C, and C might reference A. I want to create segments but dont want to create them twice, so I don't want segment A C and segment C A, just 1 of them. So I want to keep a list of created segments, ex: A C, and check if I already have an A C or C A and skip it if so.
Is there a data structure that can do this?
Thanks
if(list.contains(a,b)
{
//dont add
}
you may introduce something like
class PairKey<T extends Comparable<T>> {
final T fst, snd;
public PairKey(T a, T b) {
if (a.compareTo(b) <=0 ) {
fst = a;
snd = b;
} else {
fst = b;
snd = a;
}
}
#Override
public int hashCode() {
return a.hashCode() & 37 & b.hashCode();
}
#Override
public boolean equals(Object other) {
if (other == this) return true;
if (!(other instanceOf PairKey)) return false;
PairKey<T> obj = (PairKey<T>) other;
return (obj.fst.equals(fst) && obj.snd.equals(snd));
}
}
then you may put edges into HashSet < PairKey < ? extends Comparable> > and then check if the given pair is already there.
You will need to make your vertexes comparable, so it will be possible to treat PairKey(A,B) equal to PairKey(B,A)
And then HashSet will do the rest for you, e.g you will be able to query
pairs.contains(new PairKey(A,B));
and if pairs contain either PairKey(A,B) or PairKey(B,A) - it will return true.
hashCode implementation might be slightly different, may be IDE will generate something more sophisticated.
Hope that helps.
I would use an object called Pair that would look something like this:
class Pair
{
Node start;
Node end;
public Pair(Node start, Node end)
{
this.start=start;
this.end=end;
}
public Pair reverse()
{
return new Pair(end,start);
}
}
Now you can do something like this:
if(pairs.contains(currentPair) || pairs.contains(currentPair.reverse())
{
continue;
} else{
pairs.add(currentPair);
}
As pointed out in the comments, you will need to implement equals and hashcode. However, doing the check in equals to make it match the reversal of the segment is a bad practice in a pure OO since. By implementing equals in the fashion, described within the comments, would bind Pair to your application only and remove the portability of it.
You can use a set of sets of objects.
Set<Set<MyObjectType>> segments = new HashSet<Set<MyObjectType>>();
Then you can add two-element sets representing pairs of MyObject. Since sets are unordered, if segments contains a set with A and B, attempting to add a set containing B and A will treat it as already present in segments.
Set<MyObjectType> segment = new HashSet<MyObjectType>();
segment.add(A); // A and B are instances of MyObjectType
segment.add(B);
segments.add(segment);
segment = new HashSet<MyObjectType>();
segment.add(B);
segment.add(A);
segments.add(segment);
System.out.println("Number of segments: " + segments.size()); // prints 1
Your problem is related with graph theory.
What you can try is to remove that internal list and create a Incidence Martrix, that all you objects share.
The final solution mostly depend of the task goal and available structure. So is hard to choose best solution for you problem with the description you have provided.
Use java.util.Set/ java.util.HashSet and keep adding the references you find e.g.
Set set1 = new HashSet();
set1.add(A), set1.Add(C), set1.Add(C)
You can add this finding in an external set, as finalSet.add(set1)
Set<Set> finalSet = new HashSet<Set>();
finalSet.add(set1);
This will filter out the duplicates automatically and in the end, you will be left with A & C only.

Keeping a SortedSet of objects based on a property

I have an object, Test, that has two properties, double x and double y. I want to add these objects to a SortedSet, keeping the set sorted in ASC order on x of Test. If two instances of Test have the same x values, I want them to be sorted within the set by their y values.
I thought the following would do the trick:
private SortedSet<Test> tests = new TreeSet<Test>(new Comparator<Test>() {
#Override
public int compare(Test o1, Test o2) {
if (o1.getXpos() < o2.getXpos()) {
return -1;
}
if (o1.getXpos() > o2.getXpos()) {
return 1;
}
if (o1.getXpos() == o2.getXpos()) {
if (o1.getYpos() < o2.getYpos()) {
return -1;
}
if (o1.getYpos() > o2.getYpos()) {
return 1;
}
if (o1.getYpos() == o2.getYpos()) {
return 0;
}
}
return 0;
}
});
Instead this orders the actual x and y values; i.e.
testA: x=200, y=200,
testB: x=200, y=400
After inserting into tests:
testA: x=200, y=200,
testB: x=400, y=200
Instead of the instances within tests.
your comparator is correct. you've got bigger problems, though, if adding your Test objects to the set changes their member variables, e.g. "testB: x=400, y=200" -> "testB: x=200, y=400". I would guess your problem lies in code you have not included (maybe a botched constructor?).
Have you tried with more than two elements? More than once I've simply sorted things backwards without realizing it until later.
My guess is that comparing the doubles for exact equality using == is potentially the issue. See What's wrong with using == to compare floats in Java?

Categories