How to get intersection of two hashmaps with tolerence in Java? - java

In my java code, I have two hashmaps, and I want to get the intersection as a value. The keys are ARGB values of a color (integer) and its value is the frequency (integer). Basically each hashmap was generated from an image.
I want to determine a value that represents how close the maps are to each other. The higher the value the more close the two maps are to each other. Of course it can't be perfectly strict because in real life two colors can look the same but have slightly different ARGB values, which is where the tolerance part comes in.
So far I got this:
private int colorCompare(Result otherResult) {
HashMap<Integer, Integer> colorMap1 = getColorMap();
HashMap<Integer, Integer> colorMap2 = otherResult.getColorMap();
int sum = 0;
for (Map.Entry<Integer, Integer> entry : colorMap1.entrySet()) {
Integer key = entry.getKey();
Integer value = entry.getValue();
if (colorMap2.containsKey(key)) {
sum += value + colorMap2.get(key);
}
}
return sum;
}
public double CloseTo(Pixel otherpixel) {
Color mycolor = getColor();
Color othercolor = otherpixel.getColor();
double rmean = ( mycolor.getRed() + othercolor.getRed() )/2;
int r = mycolor.getRed() - othercolor.getRed();
int g = mycolor.getGreen() - othercolor.getGreen();
int b = mycolor.getBlue() - othercolor.getBlue();
double weightR = 2 + rmean/256;
double weightG = 4.0;
double weightB = 2 + (255-rmean)/256;
return Math.sqrt(weightR*r*r + weightG*g*g + weightB*b*b);
}
Does anyone know how to incorporate the tolerance part into it as I have no idea...
Thanks

I was unsure what the intersection of two maps would be, but it sounds as though you want to compute a distance of some sort based on the histograms of two images. One classic approach to this problem is Earth mover's distance (EMD). Assume for the moment that the images have the same number of pixels. The EMD between these two images is determined by the one-to-one correspondence between the pixels of the first image and the pixels of the second that minimizes the sum over all paired pixels of the distance between their colors. The EMD can be computed in polynomial time using the Hungarian algorithm.
If the images are of different sizes, then we have to normalize the frequencies and swap out the Hungarian algorithm for one that can solve a more general minimum-cost flow problem.

Related

Shuffling through all the points in a 3-dimensional space without storing all possible coordinates

I'm programming a 3-dimensional cellular automata. The way I'm iterating through it right now in each generation is:
Create a list of all possible coordinates in the 3D space.
Shuffle the list.
Iterate through the list until all coordinates have been visited.
Goto 2.
Here's the code:
I've a simple 3 integer struct
public class Coordinate
{
public int x;
public int y;
public int z;
public Coordinate(int x, int y, int z) {this.x = x; this.y = y; this.z = z;}
}
then at some point I do this:
List<Coordinate> all_coordinates = new ArrayList<>();
[...]
for(int z=0 ; z<length ; z++)
{
for(int x=0 ; x<diameter ; x++)
{
for(int y=0 ; y<diameter ; y++)
{
all_coordinates.add(new Coordinate(x,y,z));
}
}
}
and then in the main algorithm I do this:
private void next_generation()
{
Collections.shuffle(all_coordinates);
for (int i=0 ; i < all_coordinates.size() ; i++)
{
[...]
}
}
The problem is, once the automata gets too large, the list containing all possible points gets huge. I need a way to shuffle through all the points without having to actually store all the possible points in memory. How should I go about this?
One way to do this is to start by mapping your three dimensional coordinates into a single dimension. Let's say that your three dimensions' sizes are X, Y, and Z. So your x coordinate goes from 0 to X-1, etc. The full size of your space is X*Y*Z. We'll call that S.
To map any coordinate in 3-space to 1-space, you use the formula (x*X) + (Y*y) + z.
Of course, once you generate the numbers, you have to convert back to 3-space. That's a simple matter of reversing the conversion above. Assuming that coord is the 1-space coordinate:
x = coord/X
coord = coord % X
y = coord/Y
z = coord % Y
Now, with a single dimension to work with, you've simplified the problem to one of generating all the numbers from 0 to S in pseudo-random order, without duplication.
I know of at least three ways to do this. The simplest uses a multiplicative inverse, as I showed here: Given a number, produce another random number that is the same every time and distinct from all other results.
When you've generated all of the numbers, you "re-shuffle" the list by picking a different x and m values for the multiplicative inverse calculations.
Another way of creating a non-repeating pseudo-random sequence in a particular range is with a linear feedback shift register. I don't have a ready example, but I have used them. To change the order, (i.e. re-shuffle), you re-initialize the generator with different parameters.
You might also be interested in the answers to this question: Unique (non-repeating) random numbers in O(1)?. That user was only looking for 1,000 numbers, so he could use a table, and the accepted answer reflects that. Other answers cover the LFSR, and a Linear congruential generator that is designed with a specific period.
None of the methods I mentioned require that you maintain much state. The amount of state you need to maintain is constant, whether your range is 20 or 20,000,000.
Note that all of the methods I mentioned above give pseudo-random sequences. They will not be truly random, but they'll likely be close enough to random to fit your needs.

Using random function in selecting an object if two same distance values

I have an ArrayList unsolvedOutlets containing object Outlet that has attributes longitude and latitude.
Using the longitude and latitude of Outlet objects in ArrayList unsolvedOutlets, I need to find the smallest distance in that list using the distance formula : SQRT(((X2 - X1)^2)+(Y2-Y1)^2), wherein (X1, Y1) are given. I use Collections.min(list) in finding the smallest distance.
My problem is if there are two or more values with the same smallest distance, I'd have to randomly select one from them.
Code:
ArrayList<Double> distances = new ArrayList<Double>();
Double smallestDistance = 0.0;
for (int i = 0; i < unsolvedOutlets.size(); i++) {
distances.add(Math.sqrt(
(unsolvedOutlets.get(i).getLatitude() - currSolved.getLatitude())*
(unsolvedOutlets.get(i).getLatitude() - currSolved.getLatitude())+
(unsolvedOutlets.get(i).getLongitude() - currSolved.getLongitude())*
(unsolvedOutlets.get(i).getLongitude() - currSolved.getLongitude())));
distances.add(0.0); //added this to test
distances.add(0.0); //added this to test
smallestDistance = Collections.min(distances);
System.out.println(smallestDistance);
}
The outcome in the console would print out 0.0 but it wont stop. Is there a way to know if there are multiple values with same smallest value. Then I'd incorporate the Random function. Did that make sense? lol but if anyone would have the logic for that, it would be really helpful!!
Thank you!
Keep track of the indices with min distance in your loop and after the loop choose one at random:
Random random = ...
...
List<Integer> minDistanceIndices = new ArrayList<>();
double smallestDistance = 0.0;
for (int i = 0; i < unsolvedOutlets.size(); i++) {
double newDistance = Math.sqrt(
(unsolvedOutlets.get(i).getLatitude() - currSolved.getLatitude())*
(unsolvedOutlets.get(i).getLatitude() - currSolved.getLatitude())+
(unsolvedOutlets.get(i).getLongitude() - currSolved.getLongitude())*
(unsolvedOutlets.get(i).getLongitude() - currSolved.getLongitude()));
distances.add(newDistance);
if (newDistance < smallestDistance) {
minDistanceIndices.clear();
minDistanceIndices.add(i);
smallestDistance = newDistance;
} else if (newDistance == smallestDistance) {
minDistanceIndices.add(i);
}
}
if (!unsolvedOutlets.isEmpty()) {
int index = minDistanceIndices.get(random.nextInt(minDistanceIndices.size()));
Object chosenOutlet = unsolvedOutlets.get(index);
System.out.println("chosen outlet: "+ chosenOutlet);
}
As Jon Skeet mentioned you don't need to take the square root to compare the distances.
Also if you want to use distances on a sphere your formula is wrong:
With your formula you'll get the same distance for (0° N, 180° E) to (0° N, 0° E) as for (90° N, 180° E) to (90° N, 0° E), but while you need to travel around half the earth to travel from the first to the second, the last 2 coordinates both denote the north pole.
Note: I believe fabian's solution is superior to this, but I've kept it around to demonstrate that there are many different ways of implementing this...
I would probably:
Create a new type which contained the distance from the outlet as well as the outlet (or just the square of the distance), or use a generic Pair type for the same purpose
Map (using Stream.map) the original list to a list of these pairs
Order by the distance or square-of-distance
Look through the sorted list until you find a distance which isn't the same as the first one in the list
You then know how many - and which - outlets have the same distance.
Another option would be to simply shuffle the original collection, then sort the result by distance, then take the first element - that way even if multiple of them do have the same distance, you'll be taking a random one of those.
JB Nizet's option of "find the minimum, then perform a second scan to find all those with that distance" would be fine too - and quite possibly simpler :) Lots of options...

Unique Computational value for an array

I have been thinking of it but have ran out of idea's. I have 10 arrays each of length 18 and having 18 double values in them. These 18 values are features of an image. Now I have to apply k-means clustering on them.
For implementing k-means clustering I need a unique computational value for each array. Are there any mathematical or statistical or any logic that would help me to create a computational value for each array, which is unique to it based upon values inside it. Thanks in advance.
Here is my array example. Have 10 more
[0.07518284315321135
0.002987851573676068
0.002963866526639678
0.002526139418225552
0.07444872939213325
0.0037219653347541617
0.0036979802877177715
0.0017920256571474585
0.07499695903867931
0.003477831820276616
0.003477831820276616
0.002036159171625004
0.07383539747505984
0.004311312204791184
0.0043352972518275745
0.0011786937400740452
0.07353130134299131
0.004339580295941216]
Did you checked the Arrays.hashcode in Java 7 ?
/**
* Returns a hash code based on the contents of the specified array.
* For any two <tt>double</tt> arrays <tt>a</tt> and <tt>b</tt>
* such that <tt>Arrays.equals(a, b)</tt>, it is also the case that
* <tt>Arrays.hashCode(a) == Arrays.hashCode(b)</tt>.
*
* <p>The value returned by this method is the same value that would be
* obtained by invoking the {#link List#hashCode() <tt>hashCode</tt>}
* method on a {#link List} containing a sequence of {#link Double}
* instances representing the elements of <tt>a</tt> in the same order.
* If <tt>a</tt> is <tt>null</tt>, this method returns 0.
*
* #param a the array whose hash value to compute
* #return a content-based hash code for <tt>a</tt>
* #since 1.5
*/
public static int hashCode(double a[]) {
if (a == null)
return 0;
int result = 1;
for (double element : a) {
long bits = Double.doubleToLongBits(element);
result = 31 * result + (int)(bits ^ (bits >>> 32));
}
return result;
}
I dont understand why #Marco13 mentioned " this is not returning unquie for arrays".
UPDATE
See #Macro13 comment for the reason why it cannot be unquie..
UPDATE
If we draw a graph using your input points, ( 18 elements) has one spike and 3 low values and the pattern goes..
if that is true.. you can find the mean of your Peak ( 1, 4, 8,12,16 ) and find the low Mean from remaining values.
So that you will be having Peak mean and Low mean . and you find the unquie number to represent these two also preserve the values using bijective algorithm described in here
This Alogirthm also provides formulas to reverse i.e take the Peak and Low mean from the unquie value.
To find unique pair < x; y >= x + (y + ( (( x +1 ) /2) * (( x +1 ) /2) ) )
Also refer Exercise 1 in pdf page 2 to reverse x and y.
For finding Mean and find paring value.
public static double mean(double[] array){
double peakMean = 0;
double lowMean = 0;
for (int i = 0; i < array.length; i++) {
if ( (i+1) % 4 == 0 || i == 0){
peakMean = peakMean + array[i];
}else{
lowMean = lowMean + array[i];
}
}
peakMean = peakMean / 5;
lowMean = lowMean / 13;
return bijective(lowMean, peakMean);
}
public static double bijective(double x,double y){
double tmp = ( y + ((x+1)/2));
return x + ( tmp * tmp);
}
for test
public static void main(String[] args) {
double[] arrays = {0.07518284315321135,0.002963866526639678,0.002526139418225552,0.07444872939213325,0.0037219653347541617,0.0036979802877177715,0.0017920256571474585,0.07499695903867931,0.003477831820276616,0.003477831820276616,0.002036159171625004,0.07383539747505984,0.004311312204791184,0.0043352972518275745,0.0011786937400740452,0.07353130134299131,0.004339580295941216};
System.out.println(mean(arrays));
}
You can use this the peak and low values to find the similar images.
You can simply sum the values, using double precision, the result value will unique most of the times. On the other hand, if the value position is relevant, then you can apply a sum using the index as multiplier.
The code could be as simple as:
public static double sum(double[] values) {
double val = 0.0;
for (double d : values) {
val += d;
}
return val;
}
public static double hash_w_order(double[] values) {
double val = 0.0;
for (int i = 0; i < values.length; i++) {
val += values[i] * (i + 1);
}
return val;
}
public static void main(String[] args) {
double[] myvals =
{ 0.07518284315321135, 0.002987851573676068, 0.002963866526639678, 0.002526139418225552, 0.07444872939213325, 0.0037219653347541617, 0.0036979802877177715, 0.0017920256571474585, 0.07499695903867931, 0.003477831820276616,
0.003477831820276616, 0.002036159171625004, 0.07383539747505984, 0.004311312204791184, 0.0043352972518275745, 0.0011786937400740452, 0.07353130134299131, 0.004339580295941216 };
System.out.println("Computed value based on sum: " + sum(myvals));
System.out.println("Computed value based on values and its position: " + hash_w_order(myvals));
}
The output for that code, using your list of values is:
Computed value based on sum: 0.41284176550504803
Computed value based on values and its position: 3.7396448842464496
Well, here's a method that works for any number of doubles.
public BigInteger uniqueID(double[] array) {
final BigInteger twoToTheSixtyFour =
BigInteger.valueOf(Long.MAX_VALUE).add(BigInteger.ONE);
BigInteger count = BigInteger.ZERO;
for (double d : array) {
long bitRepresentation = Double.doubleToRawLongBits(d);
count = count.multiply(twoToTheSixtyFour);
count = count.add(BigInteger.valueOf(bitRepresentation));
}
return count;
}
Explanation
Each double is a 64-bit value, which means there are 2^64 different possible double values. Since a long is easier to work with for this sort of thing, and it's the same number of bits, we can get a 1-to-1 mapping from doubles to longs using Double.doubleToRawLongBits(double).
This is awesome, because now we can treat this like a simple combinations problem. You know how you know that 1234 is a unique number? There's no other number with the same value. This is because we can break it up by its digits like so:
1234 = 1 * 10^3 + 2 * 10^2 + 3 * 10^1 + 4 * 10^0
The powers of 10 would be "basis" elements of the base-10 numbering system, if you know linear algebra. In this way, base-10 numbers are like arrays consisting of only values from 0 to 9 inclusively.
If we want something similar for double arrays, we can discuss the base-(2^64) numbering system. Each double value would be a digit in a base-(2^64) representation of a value. If there are 18 digits, there are (2^64)^18 unique values for a double[] of length 18.
That number is gigantic, so we're going to need to represent it with a BigInteger data-structure instead of a primitive number. How big is that number?
(2^64)^18 = 61172327492847069472032393719205726809135813743440799050195397570919697796091958321786863938157971792315844506873509046544459008355036150650333616890210625686064472971480622053109783197015954399612052812141827922088117778074833698589048132156300022844899841969874763871624802603515651998113045708569927237462546233168834543264678118409417047146496
There are that many unique configurations of 18-length double arrays and this code lets you uniquely describe them.
I'm going to suggest three methods, with different pros and cons which I will outline.
Hash Code
This is the obvious "solution", though it has been correctly pointed out that it will not be unique. However, it will be very unlikely that any two arrays will have the same value.
Weighted Sum
Your elements appear to be bounded; perhaps they range from a minimum of 0 to a maximum of 1. If this is the case, you can multiply the first number by N^0, the second by N^1, the third by N^2 and so on, where N is some large number (ideally the inverse of your precision). This is easily implemented, particularly if you use a matrix package, and very fast. We can make this unique if we choose.
Euclidean Distance from Mean
Subtract the mean of your arrays from each array, square the results, sum the squares. If you have an expected mean, you can use that. Again, not unique, there will be collisions, but you (almost) can't avoid that.
The difficulty of uniqueness
It has already been explained that hashing will not give you a unique solution. A unique number is possible in theory, using the Weighted Sum, but we have to use numbers of a very large size. Let's say your numbers are 64 bits in memory. That means that there are 2^64 possible numbers they can represent (slightly less using floating point). Eighteen such numbers in an array could represent 2^(64*18) different numbers. That's huge. If you use anything less, you will not be able to guarantee uniqueness due to the pigeonhole principle.
Let's look at a trivial example. If you have four letters, a, b, c and d, and you have to number them each uniquely using the numbers 1 to 3, you can't. That's the pigeonhole principle. You have 2^(18*64) possible numbers. You can't number them uniquely with less than 2^(18*64) numbers, and hashing doesn't give you that.
If you use BigDecimal, you can represent (almost) arbitrarily large numbers. If the largest element you can get is 1 and the smallest 0, then you can set N = 1/(precision) and apply the Weighted Sum mentioned above. This will guarantee uniqueness. The precision for doubles in Java is Double.MIN_VALUE. Note that the array of weights needs to be stored in _Big Decimal_s!
That satisfies this part of your question:
create a computational value for each array, which is unique to it
based upon values inside it
However, there is a problem:
1 and 2 suck for K Means
I am assuming from your discussion with Marco 13 that you are performing the clustering on the single values, not the length 18 arrays. As Marco has already mentioned, Hashing sucks for K means. The whole idea is that the smallest change in the data will result in a large change in Hash Values. That means that two images which are similar, produce two very similar arrays, produce two very different "unique" numbers. Similarity is not preserved. The result will be pseudo random!!!
Weighted Sums are better, but still bad. It will basically ignore all the elements except for the last one, unless the last element is the same. Only then will it look at the next to last, and so on. Similarity is not really preserved.
Euclidean distance from the mean (or at least some point) will at least group things together in a sort of sensible way. Direction will be ignored, but at least things that are far from the mean won't be grouped with things that are close. Similarity of one feature is preserved, the other features are lost.
In summary
1 is very easy, but is not unique and doesn't preserve similarity.
2 is easy, can be unique and doesn't preserve similarity.
3 is easy, but is not unique and preserves some similarity.
Implementatio of Weighted Sum. Not really tested.
public class Array2UniqueID {
private final double min;
private final double max;
private final double prec;
private final int length;
/**
* Used to provide a {#code BigInteger} that is unique to the given array.
* <p>
* This uses weighted sum to guarantee that two IDs match if and only if
* every element of the array also matches. Similarity is not preserved.
*
* #param min smallest value an array element can possibly take
* #param max largest value an array element can possibly take
* #param prec smallest difference possible between two array elements
* #param length length of each array
*/
public Array2UniqueID(double min, double max, double prec, int length) {
this.min = min;
this.max = max;
this.prec = prec;
this.length = length;
}
/**
* A convenience constructor which assumes the array consists of doubles of
* full range.
* <p>
* This will result in very large IDs being returned.
*
* #see Array2UniqueID#Array2UniqueID(double, double, double, int)
* #param length
*/
public Array2UniqueID(int length) {
this(-Double.MAX_VALUE, Double.MAX_VALUE, Double.MIN_VALUE, length);
}
public BigDecimal createUniqueID(double[] array) {
// Validate the data
if (array.length != length) {
throw new IllegalArgumentException("Array length must be "
+ length + " but was " + array.length);
}
for (double d : array) {
if (d < min || d > max) {
throw new IllegalArgumentException("Each element of the array"
+ " must be in the range [" + min + ", " + max + "]");
}
}
double range = max - min;
/* maxNums is the maximum number of numbers that could possibly exist
* between max and min.
* The ID will be in the range 0 to maxNums^length.
* maxNums = range / prec + 1
* Stored as a BigDecimal for convenience, but is an integer
*/
BigDecimal maxNums = BigDecimal.valueOf(range)
.divide(BigDecimal.valueOf(prec))
.add(BigDecimal.ONE);
// For convenience
BigDecimal id = BigDecimal.valueOf(0);
// 2^[ (el-1)*length + i ]
for (int i = 0; i < array.length; i++) {
BigDecimal num = BigDecimal.valueOf(array[i])
.divide(BigDecimal.valueOf(prec))
.multiply(maxNums).pow(i);
id = id.add(num);
}
return id;
}
As I understand, you are going to make k-clustering, based on the double values.
Why not just wrap double value in an object, with array and position identifier, so you would know in which cluster it ended up?
Something like:
public class Element {
final public double value;
final public int array;
final public int position;
public Element(double value, int array, int position) {
this.value = value;
this.array = array;
this.position = position;
}
}
If you need to cluster array as a whole,
You can transform original arrays of length 18 to array of length 19 with last or first element being unique id, that you will ignore during clustering, but, to which you could refer after clustering finished. That way this have a small memory footprint - of 8 additional bytes for an array, and easy association with the original value.
If space is absolutely a problem, and you have all values of an array lesser than 1, you can add unique id, greater or equal to 1 to each array, and cluster, based on reminder of division to 1, 0.07518284315321135 stays 0.07518284315321135 for the 1st, and 0.07518284315321135 becomes 1.07518284315321135 for the 2nd, although this increases complexity of computation during clustering.
First of all, let's try to understand what you need mathematically:
Uniquely mapping an array of m real numbers to a single number is in fact a bijection between R^m and R, or at least N.
Since floating points are in fact rational numbers, your problem is to find a bijection between Q^m and N, which can be transformed to N^n to N, because you know your values will always be greater than 0 (just multiply your values by the precision).
Thus you need to map N^m to N. Take a look at the Cantor Pairing Function for some ideas
A guaranteed way to generate a unique result based on the array is to convert it to one big string, and use that for your computational value.
It may be slow, but it will be unique based on the array's values.
Implementation examples:
Best way to convert an ArrayList to a string

Hash function for 2D point in limited Euclidean space

I am storing a lot of objects with geographically positions as 2D points (x,y) in granularity of meters. To represent the world I am using a grid divided in cells of 1 square km. Currently I am using HashMap<Position, Object> for this. Any other map or appropriate data structure is fine, but I the solution works so I am only interested in solving the details.
I have been reading a lot about making good hash functions, specifically for 2D points. So far, no solutions have been really good (rated in terms of as collision-free as possible).
To test some ideas I wrote a very simple java program to generate hash codes for points from an arbitrary number (-1000,-1000) to (1000, 1000) (x1, y1 -> x2,y2) and storing them in a HashSet<Integer> and this is my result:
# java HashTest
4000000 number of unique positions
test1: 3936031 (63969 buckets, 1,60%) collisions using Objects.hash(x,y)
test2: 0 (4000000 buckets, 100,00%) collisions using (x << 16) + y
test3: 3998000 (2000 buckets, 0,05%) collisions using x
test4: 3924037 (75963 buckets, 1,90%) collisions using x*37 + y
test5: 3996001 (3999 buckets, 0,10%) collisions using x*37 + y*37
test6: 3924224 (75776 buckets, 1,89%) collisions using x*37 ^ y
test7: 3899671 (100329 buckets, 2,51%) collisions using x*37 ^ y*37
test8: 0 (4000000 buckets, 100,00%) collisions using PerfectlyHashThem
test9: 0 (4000000 buckets, 100,00%) collisions using x << 16 | (y & 0xFFFF)
Legend: number of collisions , buckets(collisions), perc(collisions)
Most of these hash functions perform really bad. In fact, the only good solution is the one that shifts x to the first 16 bits of the integer. The limitation, I guess, is that the two most distant points must not be more than the square root of Integer.MAX_INT, i.e. area must be less than 46 340 square km.
This is my test function (just copied for each new hash function):
public void test1() {
HashSet<Integer> hashCodes = new HashSet<Integer>();
int collisions = 0;
for (int x = -MAX_VALUE; x < MAX_VALUE; ++x) {
for (int y = -MAX_VALUE; y < MAX_VALUE; ++y) {
final int hashCode = Objects.hash(x,y);
if (hashCodes.contains(hashCode))
collisions++;
hashCodes.add(hashCode);
}
}
System.console().format("test1: %1$s (%2$s buckets, %3$.2f%%) collisions using Objects.hash(x,y)\n", collisions, buckets(collisions), perc(collisions));
}
Am I thinking wrong here? Should I fine-tune the primes to get better results?
Edits:
Added more hash functions (test8 and test9). test8 comes from the reponse by #nawfal in Mapping two integers to one, in a unique and deterministic way (converted from short to int).
public void test1() {
int MAX_VALUE = 1000;
HashSet<Integer> hashCodes = new HashSet<Integer>();
int collisions = 0;
for (int x = -MAX_VALUE; x < MAX_VALUE; ++x) {
for (int y = -MAX_VALUE; y < MAX_VALUE; ++y) {
final int hashCode = ((x+MAX_VALUE)<<16)|((y+MAX_VALUE)&0xFFFF);
if (hashCodes.contains(hashCode))
collisions++;
hashCodes.add(hashCode);
}
}
System.out.println("Collisions: " + collisions + " // Buckets: " + hashCodes.size());
}
Prints: Collisions: 0 // Buckets: 4000000
I a similar question with the answer being to use a Cantor pairing function. Here:
Mapping two integers to one, in a unique and deterministic way.
The Cantor pairing function can be used for negative integers as well, using bijection.

How to set edge's length proportional to edge's value

Using JDK 1.7+Jung2.
I have a similarity matrix and want to analyze it graphically using jung2 graphs. My dataset is composed by data like:
object1 object2 0.54454
object1 object3 0.45634
object2 object3 0.90023
[..]
For each line, the value represents the similarity between the previous objects (i.e.: object1 has 0.54454 similarity with object2)
I want to create a graph where the distance between vertices is proportional to their edge value.
For the example above, the object1 would be placed closer to object2 than to object3, because sim(object1,object2) > sim(object2,object3).
How can I achieve such task using Jung2? Default layouts dont seem to do this.
This depends on the layout that you intend to use. For the SpringLayout, you can pass a Transformer to the constructor as the length_function parameter, that you can simply implement as
class EdgeLengthTransformer implements Transformer<Edge, Integer> {
#Override
public Integer transform(Edge edge) {
int minLength = 100; // Length for similarity 1.0
int maxLength = 500; // Length for similarity 0.0
Vertex v0 = graph.getSource(edge);
Vertex v1 = graph.getDest(edge);
float similarity = obtainSimilarityFromYourDataset(v0, v1);
int length = (int)(minLength + (1.0 - similarity) * (maxLength - minLength));
return length;
}
}
You'll always have to take into account that - depending on the structure of the graph - it might simply not be possible to lay out the vertices as desired. For example, if the similarities do not obey the http://en.wikipedia.org/wiki/Triangle_inequality , then there is no suitable embedding of these similarities into the 2D space.

Categories