Using JDK 1.7+Jung2.
I have a similarity matrix and want to analyze it graphically using jung2 graphs. My dataset is composed by data like:
object1 object2 0.54454
object1 object3 0.45634
object2 object3 0.90023
[..]
For each line, the value represents the similarity between the previous objects (i.e.: object1 has 0.54454 similarity with object2)
I want to create a graph where the distance between vertices is proportional to their edge value.
For the example above, the object1 would be placed closer to object2 than to object3, because sim(object1,object2) > sim(object2,object3).
How can I achieve such task using Jung2? Default layouts dont seem to do this.
This depends on the layout that you intend to use. For the SpringLayout, you can pass a Transformer to the constructor as the length_function parameter, that you can simply implement as
class EdgeLengthTransformer implements Transformer<Edge, Integer> {
#Override
public Integer transform(Edge edge) {
int minLength = 100; // Length for similarity 1.0
int maxLength = 500; // Length for similarity 0.0
Vertex v0 = graph.getSource(edge);
Vertex v1 = graph.getDest(edge);
float similarity = obtainSimilarityFromYourDataset(v0, v1);
int length = (int)(minLength + (1.0 - similarity) * (maxLength - minLength));
return length;
}
}
You'll always have to take into account that - depending on the structure of the graph - it might simply not be possible to lay out the vertices as desired. For example, if the similarities do not obey the http://en.wikipedia.org/wiki/Triangle_inequality , then there is no suitable embedding of these similarities into the 2D space.
Related
I have a distance matrix and I want to use that distance matrix when clustering my data.
I've read the ELKI documentation and it states that I can overwrite the distance method when extending the AbstractNumberVectorDistanceFunction class.
The distance class however, returns the coordinates. So from coordinate x to coordinate y. This is troublesome because the distance matrix is filled only with distance values and we use the indexes to find the distance value from index x to index y. Here's the code from the documentation:
public class TutorialDistanceFunction extends AbstractNumberVectorDistanceFunction {
#Override
public double distance(NumberVector o1, NumberVector o2) {
double dx = o1.doubleValue(0) - o2.doubleValue(0);
double dy = o1.doubleValue(1) - o2.doubleValue(1);
return dx * dx + Math.abs(dy);
}
}
My question is how to correctly use the distance matrix when clustering with ELKI.
AbstractNumberVectorDistanceFunction is the approriate parent class only if your input data are number vectors. If your data type is abstract object identifiers, subclass AbstractDBIDRangeDistanceFunction instead. You then have to implement
double distance(int i1, int i2);
There are already different implementations of a distance function for precomputed distances, for example DiskCacheBasedDoubleDistanceFunction that memory-maps a distance matrix stored on disk. We should add a DoubleMatrixDistanceFunction though, for direct use from Java (in the next version, all class names and package names will be shortened, btw).
See also: https://elki-project.github.io/howto/precomputed_distances
in particular the section titled "Using without primary data", on how to set up a database with no primary data, when you only use a distance matrix.
in euclidean hash family library in java i do not understand parameters w and if there is relation between choice the value of w and value of dimension.
public EuclidianHashFamily(int w,int dimensions){
this.dimensions = dimensions;
this.w=w;
i want to know what mean by this integers to know what values assign to it.
public EuclideanHash(int dimensions,int w){
Random rand = new Random();
this.w = w;
this.offset = rand.nextInt(w);
randomProjection = new Vector(dimensions);
for(int d=0; d<dimensions; d++) {
//mean 0
//standard deviation 1.0
double val = rand.nextGaussian();
randomProjection.set(d, val);
A bit of web search leading to me quoting Wikipedia:
One of the main applications of LSH is to provide a method for efficient approximate nearest neighbor search algorithms. Consider an LSH family 𝓕. The algorithm has two main parameters: the width parameter k and the number of hash tables L.
So, w is the width.
I have an ArrayList unsolvedOutlets containing object Outlet that has attributes longitude and latitude.
Using the longitude and latitude of Outlet objects in ArrayList unsolvedOutlets, I need to find the smallest distance in that list using the distance formula : SQRT(((X2 - X1)^2)+(Y2-Y1)^2), wherein (X1, Y1) are given. I use Collections.min(list) in finding the smallest distance.
My problem is if there are two or more values with the same smallest distance, I'd have to randomly select one from them.
Code:
ArrayList<Double> distances = new ArrayList<Double>();
Double smallestDistance = 0.0;
for (int i = 0; i < unsolvedOutlets.size(); i++) {
distances.add(Math.sqrt(
(unsolvedOutlets.get(i).getLatitude() - currSolved.getLatitude())*
(unsolvedOutlets.get(i).getLatitude() - currSolved.getLatitude())+
(unsolvedOutlets.get(i).getLongitude() - currSolved.getLongitude())*
(unsolvedOutlets.get(i).getLongitude() - currSolved.getLongitude())));
distances.add(0.0); //added this to test
distances.add(0.0); //added this to test
smallestDistance = Collections.min(distances);
System.out.println(smallestDistance);
}
The outcome in the console would print out 0.0 but it wont stop. Is there a way to know if there are multiple values with same smallest value. Then I'd incorporate the Random function. Did that make sense? lol but if anyone would have the logic for that, it would be really helpful!!
Thank you!
Keep track of the indices with min distance in your loop and after the loop choose one at random:
Random random = ...
...
List<Integer> minDistanceIndices = new ArrayList<>();
double smallestDistance = 0.0;
for (int i = 0; i < unsolvedOutlets.size(); i++) {
double newDistance = Math.sqrt(
(unsolvedOutlets.get(i).getLatitude() - currSolved.getLatitude())*
(unsolvedOutlets.get(i).getLatitude() - currSolved.getLatitude())+
(unsolvedOutlets.get(i).getLongitude() - currSolved.getLongitude())*
(unsolvedOutlets.get(i).getLongitude() - currSolved.getLongitude()));
distances.add(newDistance);
if (newDistance < smallestDistance) {
minDistanceIndices.clear();
minDistanceIndices.add(i);
smallestDistance = newDistance;
} else if (newDistance == smallestDistance) {
minDistanceIndices.add(i);
}
}
if (!unsolvedOutlets.isEmpty()) {
int index = minDistanceIndices.get(random.nextInt(minDistanceIndices.size()));
Object chosenOutlet = unsolvedOutlets.get(index);
System.out.println("chosen outlet: "+ chosenOutlet);
}
As Jon Skeet mentioned you don't need to take the square root to compare the distances.
Also if you want to use distances on a sphere your formula is wrong:
With your formula you'll get the same distance for (0° N, 180° E) to (0° N, 0° E) as for (90° N, 180° E) to (90° N, 0° E), but while you need to travel around half the earth to travel from the first to the second, the last 2 coordinates both denote the north pole.
Note: I believe fabian's solution is superior to this, but I've kept it around to demonstrate that there are many different ways of implementing this...
I would probably:
Create a new type which contained the distance from the outlet as well as the outlet (or just the square of the distance), or use a generic Pair type for the same purpose
Map (using Stream.map) the original list to a list of these pairs
Order by the distance or square-of-distance
Look through the sorted list until you find a distance which isn't the same as the first one in the list
You then know how many - and which - outlets have the same distance.
Another option would be to simply shuffle the original collection, then sort the result by distance, then take the first element - that way even if multiple of them do have the same distance, you'll be taking a random one of those.
JB Nizet's option of "find the minimum, then perform a second scan to find all those with that distance" would be fine too - and quite possibly simpler :) Lots of options...
In my java code, I have two hashmaps, and I want to get the intersection as a value. The keys are ARGB values of a color (integer) and its value is the frequency (integer). Basically each hashmap was generated from an image.
I want to determine a value that represents how close the maps are to each other. The higher the value the more close the two maps are to each other. Of course it can't be perfectly strict because in real life two colors can look the same but have slightly different ARGB values, which is where the tolerance part comes in.
So far I got this:
private int colorCompare(Result otherResult) {
HashMap<Integer, Integer> colorMap1 = getColorMap();
HashMap<Integer, Integer> colorMap2 = otherResult.getColorMap();
int sum = 0;
for (Map.Entry<Integer, Integer> entry : colorMap1.entrySet()) {
Integer key = entry.getKey();
Integer value = entry.getValue();
if (colorMap2.containsKey(key)) {
sum += value + colorMap2.get(key);
}
}
return sum;
}
public double CloseTo(Pixel otherpixel) {
Color mycolor = getColor();
Color othercolor = otherpixel.getColor();
double rmean = ( mycolor.getRed() + othercolor.getRed() )/2;
int r = mycolor.getRed() - othercolor.getRed();
int g = mycolor.getGreen() - othercolor.getGreen();
int b = mycolor.getBlue() - othercolor.getBlue();
double weightR = 2 + rmean/256;
double weightG = 4.0;
double weightB = 2 + (255-rmean)/256;
return Math.sqrt(weightR*r*r + weightG*g*g + weightB*b*b);
}
Does anyone know how to incorporate the tolerance part into it as I have no idea...
Thanks
I was unsure what the intersection of two maps would be, but it sounds as though you want to compute a distance of some sort based on the histograms of two images. One classic approach to this problem is Earth mover's distance (EMD). Assume for the moment that the images have the same number of pixels. The EMD between these two images is determined by the one-to-one correspondence between the pixels of the first image and the pixels of the second that minimizes the sum over all paired pixels of the distance between their colors. The EMD can be computed in polynomial time using the Hungarian algorithm.
If the images are of different sizes, then we have to normalize the frequencies and swap out the Hungarian algorithm for one that can solve a more general minimum-cost flow problem.
I'm doing a project in Java which includes (x,y) coordinates.
I have created a class of Cell which has protected integers X & Y;
Upon initialization, i do a for loop which sets an array of cell by multiplying the X & Y given by the user, say if X= 10 and Y = 10, i create an array of cells[100].
However, how can i search the array fast, without doing a for loop and checking each individual value very time?
Say I'm looking for the object that contains X=5 & y = 3.
I know i can go through with a for loop looking for object with values x and y, but i was wondering if there is a way to do a binary search and find "a bit faster" the object[i] that contains X=5 and Y=5.
Thank you very much.
The way to do this is to arrange the Cell objects in the array in a way so that there is a simple mapping from an X,Y coordinate to the Cell's index in the array.
For example, lets assume that X and Y go from 1 to 10. Suppose that we then arrange the Cells so that:
array[0] = Cell(1, 1);
array[1] = Cell(1, 2);
...
array[9] = Cell(1, 10);
array[10] = Cell(2, 1);
array[11] = Cell(2, 2);
...
array[99] = Cell(10, 10);
It should be easy to see that we can calculate the index of Cell(i,j) in the array and fetch the cell as follows:
public Cell getCell(Cell[] array, int i, int j) {
int index = (10 * (i - 1)) + (j - 1);
return array[index];
}
This is the approach that programming languages that support N-dimensional array types typically use to implement them.
This can be trivially modified to deal with cases where:
the constant 10 is something else
the matrix is not square,
the matrix has more than two dimensions
indexes run from 0 to N - 1 instead of 1 to N
etcetera
There are various other ways that you could represent 2-D matrices in Java. The simplest one is just using a Cell[][] cells which allows you to access cells as (for example) cells[i-1][j-1]. More complicated representations can be designed that use less space if the matrix is sparse (i.e. cells are missing) at the cost of more complex code and slower access times.
It sounds like (if you want to use binary search, anyway) you're setting element 0 to the Cell with x = 0, y = 0; element 1 to x = 0, y = 1, etc. If so you should be able to trivially compute the exact index of a given Cell:
// contains the Cell with x = desiredX, y = desiredY
yourArray[desiredX * X + desiredY];
If this is what you're doing, however, it'd probably be simpler to just make a 2-dimensional array:
yourArray = new Cell[X][Y];
...
yourArray[desiredX][desiredY];
the above two answers show the trivial method for getting the array index fast. id like to propose an alternative- use hashmaps with key, value pairings. the value could be objects. accessing hashmap elements run in constant time..