Graph Theory: Find the Jordan center? - java

I'm trying to find the set of vertices that minimizes their distance to other vertices on a weighted graph. Based on a cursory wikipedia search, I think that this is called the Jordan Center. What are some good algorithms for finding it?
Right now, my plan is to get a list of the weight for each branch emanating from a given vertex. The vertices whose weights have the smallest relative difference will be the central ones. Any other ideas?
I'm using Java, but helpful answers don't necessarily need to be Java specific.

I woluld first use Dijkstra algorithm (it has to be run for each verticle) for computng shortest distances between all pairs of verticles - there are also some more efficient algorithms for that like Floyd-Warshall. Then for each verticle V you have to find Vm - the largest distance to any other verticles amongs the data retuirned form Dijkstra algorithm. Then, the verticles with the smallest Vm are the one in the graph center. Pseudocode:
int n = number of verticles;
int[][] D = RunDijkstraOrWarshall()
// D[a,b] = length of shortest path from a to b
int[] Vm = new int[n];
for(int i=0; i<n i++)
{
Vm[i] = 0
for(int j=0; j<n; j++)
{
if (Vm[i] < D[i,j]) Vm[i] = D[i,j];
}
}
minVm = int.Max;
for(int i=0; i<n ;i++)
{
if (minVm < Vm[i]) minVm = Vm[i];
}
for(int i=0; i<n ;i++)
{
if (Vm[i] == minVm)
{
// graph center contans i
}
}

Three algorithms for graph center problem are presented in this MSc thesis: A distributed algorithm for the graph center problem.

Starting from JGraphT version 1.1.0, you can simply use the method GraphMeasurer.getGraphCenter(). The underlying code uses a shortest path method. The user can choose which method to use, depending on some characteristics of the graph (e.g. sparse/dense/...).

Related

How to graphically simulate the result of the K-Means Algorithm?

I'm a student in computing sciences in Paris. In mathematics this year we have to use the K-means algorithm to solve a problem (the Clustered Capacited Vehicle Routing Problem applied to the resupplying of self-service bicycles' stations). Here is my algorithm :
public void run() {
boolean hasConverged = false;
List<Integer> nearestClusters = null;
//A list used to check if the nearestClusters list has evolved
//If it isn't the case, the algorithm is finish
List<Integer> previousList = new ArrayList<Integer>();
//Random initialization of the clusters' centroids
for (int i = 0; i < clustersNumber; ++i) {
clusters.add(ClusterGenerator.Generate(stationsList,colorList.get(i) ,latMin, latMax, lngMin, lngMax));
}
while (!hasConverged) {
if (nearestClusters != null) {
previousList.clear();
previousList.addAll(nearestClusters);
}
nearestClusters= new ArrayList<Integer>();
//Each point is connected to it nearest cluster
for (int j = 0; j < stationsList.size(); ++j) {
nearestClusters.add(getIndexOfTheNearestCluster(stationsList.get(j)));
}
//We move the clusters centroids to the center of the points they are connected to
for (int k = 0; k < clusters.size(); ++k) {
clusters.get(k).setCentre(stationsCenters(getStationsOfCluster(clusters.get(k), nearestClusters)));
}
if (!nearestClusters.isEmpty() && previousList.equals(nearestClusters))
hasConverged = true;
}
}
Yet, I wanted to show the result of my algorithm with the clusters formed and I found this work on the Internet : https://github.com/ertugrulozcan/K-Means-Simulation
I imported in my project the class ClusterGenerator which creates clusters along with random elements, the class Item, the class Graphic (I didn't touch anything there) and the class MainWindow which initiates all the graphic elements.
I did not manage to display the plots and there are no errors in Eclipse that could give me any clue.
Can someone please explain to me where is the problem ?
Thanks
The problem was that my algorithm was generating clusters for the stations but I did not configure the class Graphic (which I understood later was very important for the display) to render correctly my points. Since, I used latitude and longitude as coordinates for my station, I had to put these coordinates to scale for the window. Here is how I did that (using cross multiplications) : I calculate the "gap" between two units in the graph and added an adjustment because I don't start at zero.
double gapX = (this.getWidth() - 2 * edgeSpace) / (topX-bottomX+1);
int adjustmentX =(int) (-bottomX*gapX);
(getWidth() gives the actual width of the panel where is the graph, edgespace is the padding space between the graph and the edge of the panel, topX is the maximum value of a coordinate and bottomX the minimum value)

Android- HOG descriptors distance

I want to compare 2 HOG descriptors in an android application using OpenCV. I find difficulties in the computation of the euclidian distance beetween the two vectors that their type is MatOfFloat. Have you an example of code that can help me.
The fonction that computes the HOG descriptors is mHOGDescriptor.compute(imgMat, descriptors, winStride, padding, locations); the output of this funcition is descriptors that his type is MatOfFloat. Once i find the HOG descriptors for 2 images, I want to compute the euclidian distance between it and here exactly where I find problem.
I try this code but that does not work:
for(int i=0; i<imgMat.rows();i++)
{
for(int j=0; j<imgMat.cols();j++)
{
distance1=(int) (distance1+(mDescriptors1.get(i, j)-mDescriptors2.get(i, j)));
}
}
I see two problems with your code:
It is not the correct formula for the euclidean distance
You convert to int at each iteration. It is not a good idea because the values of the descriptor are float and less than 1 (it is composed of normalized histograms), so you round your distance to zero.
Try the following code :
distance=0;
for(int i=0; i<imgMat.rows();i++)
{
for(int j=0; j<imgMat.cols();j++)
{
distance+=(mDescriptors1.get(i, j)[0]-mDescriptors2.get(i, j)[0])*(mDescriptors1.get(i, j)[0]-mDescriptors2.get(i, j)[0]);
}
}

Finding the minimum-cut value of graph using Kruskal's algorithm

I'm a beginner and I am trying to find the minimum cut of a graph using Kruskal's algorithm in Java.
I have gotten to where I can read the input and create vertexCount^2 number of MST's with random weights for the edges. All I have left to do is figure out from my MST how many cuts are required to separate S and V-S. This will allow me to choose the minimum out of the vertexCount^2 number of choices.
I think I understand correctly that I am supposed to ignore the last edge of the MST to get S and V-S. But I'm lost on how to figure out how many edges are connecting S and V-S.
So my question is: 1) Is vertexCount^2 random MST's enough to be confident that it will contain the minimum-cut? 2) How can I find how many edges are connecting S and V-S?
PS. This is a snippet form my code:
// create weighted edge graph from input.txt
int vertexCount, edgeCount;
Edge edgeTemp;
vertexCount = s.nextInt();
edgeCount = s.nextInt();
EdgeWeightedGraph G = new EdgeWeightedGraph(vertexCount, edgeCount);
for (int j = 0; j < edgeCount; j++) {
edgeTemp = new Edge(s.nextInt(), s.nextInt(), new Random().nextInt(edgeCount));
G.addEdge(edgeTemp);
}
// create kruskal's mst from graph G
for (int j = 0; j < vertexCount*vertexCount; j++) {
KruskalMST mst = new KruskalMST(G);
for (Edge e : mst.edges()) {
System.out.println(e);
}
System.out.println(NEWLINE);
if (j != vertexCount*vertexCount - 2)
G.randomizeWeight(edgeCount);
}
PSS. In case this is relevant, I looked at the code from http://algs4.cs.princeton.edu/43mst/ when writing my code.
When getting the MST from the graph, I used Kruskal's algorithm. That means I must have used union and find methods.
Each vertex is its own parent in the beginning. When getting the union of distinct components from the graph, I assign the combining components (including singletons) to have a single parent. So by the time I'm left with S and V-S, all the vertices of each component will have the same parent!
Therefore, I go back to my EdgeWeightedGraph and iterate all the edges in the graph (not the MST!). When I find an edge whose two vertices have different parents, that means that the edge connects component S and V-S. I count++ every time I see an edge like this.
This gives me the total number of cuts needed in the graph!

Karger's Algorithm

I'm trying to implement the min-cut Karger's algorithm in Java. For this, I created a Graph class which stores a SortedMap, with an integer index as key and a Vertex object as value, and an ArrayList of Edge objects. Edges stores the index of its incident vertices. Than I merge the vertices of some random edge until the number of vertices reach 2. I repeat this steps a safe number of times. Curiously, in my output I get 2x the number of crossing edges. I mean, if the right answer is 10, after execute n times the algorithm (for n sufficient large), the min of these execution results is 20, what makes me believe the implementation is almost correct.
This is the relevant part of code:
void mergeVertex(int iV, int iW) {
for (int i = 0; i < edges.size(); i++) {
Edge e = edges.get(i);
if (e.contains(iW)) {
if (e.contains(iV)) {
edges.remove(i);
i--;
} else {
e.replace(iW, iV);
}
}
}
vertices.remove(iW);
}
public int kargerContraction(){
Graph copy = new Graph(this);
Random r = new Random();
while(copy.getVertices().size() > 2){
int i = r.nextInt(copy.getEdges().size());
Edge e = copy.getEdges().get(i);
copy.mergeVertex(e.getVertices()[0], e.getVertices()[1]);
}
return copy.getEdges().size()/2;
}
Actually the problem was much more simple than I thought. While reading the .txt which contains the graph data, I was counting twice each edge, so logically the minCut returned was 2 times the right minCut.

Steps to perform document clustering using k-means algorithm in java

I need steps to perform document clustering using k-means algorithm in java.
It will be very useful for me to provide the steps easily.
Thanks in advance.
You need to count the words in each document and make a feature generally called bag of words. Before that you need to remove stop words(very common but not giving much information like the, a etc). You can generally take top n common words from your document. Count the frequency of these words and store them in n dimensional vector.
For distance measure you can use cosine vector.
Here is a simple algorithm for 2 mean for 1 dimensional data points. you can extend it to k mean and n dimensional data point easily. Let me know if you want n dim implementation.
double[] x = {1,2,2.5,3,3.5,4,4.5,5,7,8,8.5,9,9.5,10};
double[] center = new int[2];
double[] precenter = new int[2];
ArrayList[] cluster = new ArrayList[2];
//generate 2 random number from 0 to x.length without replacement
int rand = new int[2];
Random rand = new Random();
rand[0] = rand.nextInt(x.length + 1);
rand[1] = rand.nextInt(x.length + 1);
while(rand[0] == rand[1] ){
rand[1] = rand.nextInt(x.length + 1);
}
center[0] = x[rand[0]];
center[1] = x[rand[1]];
//there is a better way to generate k random number (w/o replacement) just search.
do{
cluster[0].clear();
cluster[1].clear();
for(int i = 0; i < x.length; ++i){
if(abs(x[i]-center1[0]) <= abs(x[i]-center1[1])){
cluster[0].add(x[i]);
}
else{
cluster[0].add(x[i]);
}
precenter[0] = center[0];
precenter[1] = center[1];
center[0] = mean(cluster[0]);
center[1] = mean(cluster[1]);
}
} while(precenter[0] != center[0] && precenter[1] != center[1]);
double mean(ArrayList list){
double mean = 0;
double sum = 0;
for(int index=0;index
}
The cluster[0] and cluster [1] contain points in the clusters and center[0], center[1] are the 2 means.
you need to do some debugging because I have written the code in R and just converted it into java for you :)
Does this help you? Also the wiki article has some links to implementations in other languages ready to be ported to java.
Steps of the algorithm:
Define the number of clusters you want to have
Distribute the points radomly in your problem space.
Link every observation to the nearest point.
calculate the center of mass for each cluster and place the point into the middle.
Link the points again to the centerpoints and repeat until the points dont move any more.
What do you want to cluster the documents based on? If it's by similarity you'll need to do some natural language processing first, and then you'll need a metric (some kind of assignment algorithm) to place the documents into clusters (crp works and is relatively straight forward).
The hardest part will be the NLP (language processing) if you're not clustering them based on something like "length". I can provide more info on all of these, but I won't dive down the rabbit hole if you don't need it.

Categories