I was wondering if I could get some advice on increasing the overall efficiency of a program that implements a genetic algorithm. Yes this is an assignment question, but I have already completed the assignment on my own and am simply looking for a way to get it to perform better
Problem Description
My program at the moment reads a given chain made of the types of constituents, h or p. (For example: hphpphhphpphphhpphph) For each H and P it generated a random move (Up, Down, Left, Right) and adds the move to an arrayList contained in the "Chromosome" Object. At the start the program is generating 19 moves for 10,000 Chromosomes
SecureRandom sec = new SecureRandom();
byte[] sbuf = sec.generateSeed(8);
ByteBuffer bb = ByteBuffer.wrap(sbuf);
Random numberGen = new Random(bb.getLong());
int numberMoves = chromosoneData.length();
moveList = new ArrayList(numberMoves);
for (int a = 0; a < numberMoves; a++) {
int randomMove = numberGen.nextInt(4);
char typeChro = chromosoneData.charAt(a);
if (randomMove == 0) {
moveList.add(Move.Down);
} else if (randomMove == 1) {
moveList.add(Move.Up);
} else if (randomMove == 2) {
moveList.add(Move.Left);
} else if (randomMove == 3) {
moveList.add(Move.Right);
}
}
After this comes the selection of chromosomes from the Population to crossover. My crossover function selections the first chromosome at random from the fittest 20% of the population and the other at random from outside of the top 20%. The chosen chromosomes are then crossed and a mutation function is called. I believe the area in which I am taking the biggest hit is calculating the fitness of each Chromosome. Currently my fitness function creates a 2d Array to act as a grid, places the moves in order from the move list generated by the function shown above, and then loops through the array to do the fitness calculation. (I.E. found and H at location [2,1] is Cord [1,1] [3,1] [2,0] or [2,2] also an H and if an H is found it just increments the count of bonds found)
After the calculation is complete the least fit chromosome is removed from my population and the new one is added and then the array list of chromosomes is sorted. Rinse and repeat until target solution is found
If you guys want to see more of my code to prove I actually did the work before asking for help just let me know (dont want to post to much so other students cant just copy pasta my stuff)
As suggested in the comments I have ran the profiler on my application (have never used it before, only a first year CS student) and my initial guess on where i am having issues was somewhat incorrect. It seems from what the profiler is telling me is that the big hotspots are:
When comparing the new chromosome to the others in the population to determine its position. I am doing this by implementing Comparable:
public int compareTo(Chromosome other) {
if(this.fitness >= other.fitness)
return 1;
if(this.fitness ==other.fitness )
return 0;
else
return -1;
}
The other area of issue described is in my actual evolution function, consuming about 40% of the CPU time. A codesample from said method below
double topPercentile = highestValue;
topPercentile = topPercentile * .20;
topPercentile = Math.ceil(topPercentile);
randomOne = numberGen.nextInt((int) topPercentile);
//Lower Bount for random two so it comes from outside of top 20%
int randomTwo = numberGen.nextInt(highestValue - (int) topPercentile);
randomTwo = randomTwo + 25;
//System.out.println("Selecting First: " + randomOne + " Selecting Second: " + randomTwo);
Chromosome firstChrom = (Chromosome) populationList.get(randomOne);
Chromosome secondChrom = (Chromosome) populationList.get(randomTwo);
//System.out.println("Selected 2 Chromosones Crossing Over");
Chromosome resultantChromosome = firstChrom.crossOver(secondChrom);
populationList.add(resultantChromosome);
Collections.sort(populationList);
populationList.remove(highestValue);
Chromosome bestResult = (Chromosome) populationList.get(0);
The other main preformance hit is the inital population seeding which is performed by the first code sample in the post
I believe the area in which I am taking the biggest hit is calculating the fitness of each Chromosome
If you are not sure then I assume you have not run a profiler on the program yet.
If you want to improve the performance, profiling is the first thing you should do.
Instead of repeatedly sorting your population, use a collection that maintains its contents already sorted. (e.g. TreeSet)
If your fitness measure is consistent across generations (i.e. not dependent on other members of the population) then I hope at least that you are storing that in the Chromosome object so you only calculate it once for each member of the population. With that in place you'd only be calculating fitness on the newly generated/assembled chromosome each iteration. Without more information on how fitness if calculated it's difficult to be able to offer any optimisations in that area.
Your random number generator seed doesn't need to be cryptographically strong.
Random numberGen = new Random();
A minor speedup when seeding your population is to remove all the testing and branching:
static Move[] moves = {Move.Down, Move.Up, Move.Left, Move.Right};
...
moveList.add(moves[randomMove]);
Related
I was recently asked this in an interview.
Given below are the the candidates and the time at which they got a vote.
Q. Given a time, print the person winning till that time.
Cand. Time
A 4
B 10
C 15
C 18
C 21
B 35
B 40
B 42
E.g In the Qsn above, if we are asked to find the winner at time 20, answer would be C -> Since C has 2 votes.
Tried Solution
Have a Map<String, List> to store Map<Candidate, [Time, votes]>
We can iterate through the array & fetch only the times which are less than 20 (as per the question).
But I believe there will be a more optimum way to solve this type of problem.
Essentially store the given data in a proper Data Structure which will give us the result in optimum time.
Thanks
What I would do is as follows:
First, implement a naive solution, such as the one you thought of, or the one suggested by Pp88 above.
Then, immediately write a test for it, which shows(1) that it works.
Then, implement a more optimal solution, such as the one which follows.
Finally, re-use the previous test to show(1) that the more optimal solution also works.
A more optimal solution could be as follows:
Build a LinkedHashMap where the key is a time coordinate, and the value is one more map, in which the keys are the names of all candidates, and the values are the accumulated vote count of each candidate at that time coordinate.
Traverse this LinkedHashMap and create a new map, where the key is again a time coordinate, and the value is the name of the winning candidate at that time coordinate. Throw away the previous map.
Build an ArrayList containing all the keys in the map.
Once you have done all of the above, any query of the type "who is the winner at time X" can be answered by performing a binary search in the ArrayList to find the time coordinate which is closest to but does not exceed X, and then a look-up of the time coordinate in the map to find the name of the candidate who was winning at that moment.
Ignoring the overhead of preparing the data structures, the time complexity of each query is equal to the time complexity of binary search, which is O(log2 n). This is sub-linear, so it is better than O(N).
(1) At best, we can say that a test "shows that it works"; it does not prove anything. The most accurate way of putting it is that it "gives sufficient reason to believe that it works", but that's too long, so "it shows that it works" is a decent alternative.
This is an O(N) solution, I'm assuming you have in input 2 arrays one for candidates and one for times. Who will win if there are the same amount of votes is not clear so in this case the first wins.
public Optional<Character> findCandidateWithMaxVotes(Character[] candidates, int[] times, int timeLimit) {
Character cadidateWithMaxVotes = null;
int max = 0, count = 0;
Map<Character, Integer> numberOvVotesForCandidate = new HashMap<>();
for(int i = 0; i < times.length; i++) {
if(times[i] <= timeLimit) {
count = numberOvVotesForCandidate.merge(candidates[i], 1, Integer::sum);
if(max < count) {
max = count;
cadidateWithMaxVotes = candidates[i];
}
}
}
return Optional.ofNullable(cadidateWithMaxVotes);
}
It's a bit unclear should we answer just one question ("who is the winner at time 20") or we are
supposed to preprocess the data provided and then answer several queries ("who is the winner at time 20", "who is winner at time 8" etc.).
The first problem is easy:
// Nobody is leading before elections with 0 votes
String leader = null;
int leaderVotes = 0;
HashMap<string, Integer> ballots = new HashMap<string, Integer>();
for (int i = 0; i < candidates.length; ++i) {
// too late, don't count this vote
if (times[i] > givenTime)
continue;
// number of votes
int current = ballots.containsKey(candidates[i])
? ballots.get(candidates[i]) + 1
: 1;
ballots.set(candidates[i], current);
// do we have a leader change?
//TODO: add tie breaking logic here
if (current > leaderVotes) {
leaderVotes = current;
leader = candidates[i];
}
}
// at givenTime we have leader with leaderVotes
The second problem is trickier:
We sort the votes by time
Scan them as we do in the first problem
On every leader change we add a record into (time, leader) list
Having all these done we have a sorted list which is ready for binary search: for given time we are looking for the latest record which is not later than time.
Please help me,
I try to implement GA in java to resolve minimize of summation of (Xi)^2 function that X value is double between [-100,100] , i = 1,2,3,...,30 and have 4 populations.
I can't get the correct result.
Check Source Code
Thanks you
GeneticAlgorithm.java
....
public Population evolvePopulation(Population population) {
Population newPopulation = new Population(population.size());
for (int i = 0; i < population.size(); ++i) {
Individual firstIndividual = randomSelection(population);
Individual secondIndividual = randomSelection(population);
Individual newIndividual = crossover(firstIndividual, secondIndividual);
newPopulation.saveIndividual(i, newIndividual);
}
for (int i = 0; i < newPopulation.size(); ++i) {
mutate(newPopulation.getIndividual(i));
}
return newPopulation;
}
GeneticAlgorithm.java
public Individual randomSelection(Population population) {
Population newPopulation = new Population(Constants.TOURNAMENT_SIZE);
for (int i = 0; i < Constants.TOURNAMENT_SIZE; ++i) {
int randomIndex = (int)(Math.random()*population.size());
newPopulation.saveIndividual(i, population.getIndividual(randomIndex));
}
Individual fittestIndividual = newPopulation.getFittestIndividual();
return fittestIndividual;
}
Population.java
...
public Population evolvePopulation(Population population) {
Population newPopulation = new Population(population.size());
for (int i = 0; i < population.size(); ++i) {
Individual firstIndividual = randomSelection(population);
Individual secondIndividual = randomSelection(population);
Individual newIndividual = crossover(firstIndividual, secondIndividual);
newPopulation.saveIndividual(i, newIndividual);
}
for (int i = 0; i < newPopulation.size(); ++i) {
mutate(newPopulation.getIndividual(i));
}
return newPopulation;
}
...
Individual.java
public void generateIndividual() {
df = new DecimalFormat(".##");
for (int i = 0; i < Constants.CHROMOSOME_LENGTH; i++) {
String gen = df.format((randomGenerator.nextDouble()*(201))-100);
double gene = Double.parseDouble(gen);
genes[i] = gene;
}
}
Result:
Generation: 1 - fittest is: 187840.0388
Generation: 2 - fittest is: 145642.2474
Generation: 3 - fittest is: 143804.0066
Generation: 4 - fittest is: 164595.30819999994
Generation: 5 - fittest is: 192525.51659999997
Generation: 6 - fittest is: 176011.80959999998
Generation: 7 - fittest is: 165286.99679999996
Generation: 8 - fittest is: 181544.71279999998
Generation: 9 - fittest is: 180144.33559999996
Generation: 10 - fittest is: 178226.74199999994
First of all, genetic search is not guaranteed to find an optimal solution in any finite amount of time, although it might converge very fast in its immediate neighborhood. Also, genetic exploration does not provide any mean to certify that a given solution is actually optimal, so it should be used only when a good enough solution is fine.
Looking at the code snippets you posted here on S.O., I would make the following observations:
In randomSelection(), you randomly select a subset of the initial population, and then return the best individual among that group. My take on this is that you want to give a reproductive advantage to those individuals that, according to the fitness function, carry a better genome. Is that correct?
In evolvePopulation(), you randomly select two parents, and then combine them with a crossover step. You then apply a random mutation to the resulting genome, and add it to the new population.
At this point I see the following issue: if both parents are randomly selected, there is absolutely no guarantee that the best individuals of your population actually get the chance to reproduce. I would instead select only one of the two parents randomly, so to ensure that every individual has at least one child.
It is also unclear to me, because it is not shown, whether mutate() does always change some part of the genome, or whether this is only done with a given, low, probability.
It is important to keep in mind that both crossover() and mutate() can have a destructive effect on your population: it is not written anywhere that the offspring resulting from applying these operators is better than the previous generation.
Thus, what you seem to be missing out is a selection() step. One should generate many more individuals than it intends to keep inside the new generation, and then kill off all the worst ones before letting them to reproduce again.
Moreover, if you don't want your search to temporarily degenerate, you should save the best individual (or the best first N individuals) across two subsequent generations.
I have created a gameboard (5x5) and I now want to decide when a move is legal as fast as possible. For example a piece at (0,0) wants to go to (1,1), is that legal? First I tried to find this out with computations but that seemed bothersome. I would like to hard-code the possible moves based on a position on the board and then iterate through all the possible moves to see if they match the destinations of the piece. I have problems getting this on paper. This is what I would like:
//game piece is at 0,0 now, decide if 1,1 is legal
Point destination = new Point(1,1);
destination.findIn(legalMoves[0][0]);
The first problem I face is that I don't know how to put a list of possible moves in an array at for example index [0][0]. This must be fairly obvious but I am stuck at this for some time. I would like to create an array in which there is a list of Point objects. So in semi-code: legalMoves[0][0] = {Point(1,1),Point(0,1),Point(1,0)}
I am not sure if this is efficient but it makes logically move sense than maybe [[1,1],[0,1],[1,0]] but I am not sold on this.
The second problem I have is that instead of creating the object at every start of the game with an instance variable legalMoves, I would rather have it read from disk. I think that it should be quicker this way? Is the serializable class the way to go?
My 3rd small problem is that for the 25 positions the legal moves are unbalanced. Some have 8 possible legal moves, others have 3. Maybe this is not a problem at all.
You are looking for a structure that will give you the candidate for a given point, i.e. Point -> List<Point>.
Typically, I would go for a Map<Point, List<Point>>.
You can initialise this structure statically at program start or dynamically when needing. For instance, here I use 2 helpers arrays that contains the possible translations from a point, and these will yield the neighbours of the point.
// (-1 1) (0 1) (1 1)
// (-1 0) (----) (1 0)
// (-1 -1) (0 -1) (1 -1)
// from (1 0) anti-clockwise:
static int[] xOffset = {1,1,0,-1,-1,-1,0,1};
static int[] yOffset = {0,1,1,1,0,-1,-1,-1};
The following Map contains the actual neighbours for a Point with a function that compute, store and return these neighbours. You can choose to initialise all neighbours in one pass, but given the small numbers, I would not think this a problem performance wise.
static Map<Point, List<Point>> neighbours = new HashMap<>();
static List<Point> getNeighbours(Point a) {
List<Point> nb = neighbours.get(a);
if (nb == null) {
nb = new ArrayList<>(xOffset.length); // size the list
for (int i=0; i < xOffset.length; i++) {
int x = a.getX() + xOffset[i];
int y = a.getY() + yOffset[i];
if (x>=0 && y>=0 && x < 5 && y < 5) {
nb.add(new Point(x, y));
}
}
neighbours.put(a, nb);
}
return nb;
}
Now checking a legal move is a matter of finding the point in the neighbours:
static boolean isLegalMove(Point from, Point to) {
boolean legal = false;
for (Point p : getNeighbours(from)) {
if (p.equals(to)) {
legal = true;
break;
}
}
return legal;
}
Note: the class Point must define equals() and hashCode() for the map to behave as expected.
The first problem I face is that I don't know how to put a list of possible moves in an array at for example index [0][0]
Since the board is 2D, and the number of legal moves could generally be more than one, you would end up with a 3D data structure:
Point legalMoves[][][] = new legalMoves[5][5][];
legalMoves[0][0] = new Point[] {Point(1,1),Point(0,1),Point(1,0)};
instead of creating the object at every start of the game with an instance variable legalMoves, I would rather have it read from disk. I think that it should be quicker this way? Is the serializable class the way to go?
This cannot be answered without profiling. I cannot imagine that computing legal moves of any kind for a 5x5 board could be so intense computationally as to justify any kind of additional I/O operation.
for the 25 positions the legal moves are unbalanced. Some have 8 possible legal moves, others have 3. Maybe this is not a problem at all.
This can be handled nicely with a 3D "jagged array" described above, so it is not a problem at all.
I am trying to build a 4 x 4 sudoku solver by using the genetic algorithm. I have some issues with values converging to local minima. I am using a ranked approach and removing the bottom two ranked answer possibilities and replacing them with a crossover between the two highest ranked answer possibilities. For additional help avoiding local mininma, I am also using mutation. If an answer is not determined within a specific amount of generation, my population is filled with completely new and random state values. However, my algorithm seems to get stuck in local minima. As a fitness function, I am using:
(Total Amount of Open Squares * 7 (possible violations at each square; row, column, and box)) - total Violations
population is an ArrayList of integer arrays in which each array is a possible end state for sudoku based on the input. Fitness is determined for each array in the population.
Would someone be able to assist me in determining why my algorithm converges on local minima or perhaps recommend a technique to use to avoid local minima. Any help is greatly appreciated.
Fitness Function:
public int[] fitnessFunction(ArrayList<int[]> population)
{
int emptySpaces = this.blankData.size();
int maxError = emptySpaces*7;
int[] fitness = new int[populationSize];
for(int i=0; i<population.size();i++)
{
int[] temp = population.get(i);
int value = evaluationFunc(temp);
fitness[i] = maxError - value;
System.out.println("Fitness(i)" + fitness[i]);
}
return fitness;
}
Crossover Function:
public void crossover(ArrayList<int[]> population, int indexWeakest, int indexStrong, int indexSecStrong, int indexSecWeak)
{
int[] tempWeak = new int[16];
int[] tempStrong = new int[16];
int[] tempSecStrong = new int[16];
int[] tempSecWeak = new int[16];
tempStrong = population.get(indexStrong);
tempSecStrong = population.get(indexSecStrong);
tempWeak = population.get(indexWeakest);
tempSecWeak = population.get(indexSecWeak);
population.remove(indexWeakest);
population.remove(indexSecWeak);
int crossoverSite = random.nextInt(14)+1;
for(int i=0;i<tempWeak.length;i++)
{
if(i<crossoverSite)
{
tempWeak[i] = tempStrong[i];
tempSecWeak[i] = tempSecStrong[i];
}
else
{
tempWeak[i] = tempSecStrong[i];
tempSecWeak[i] = tempStrong[i];
}
}
mutation(tempWeak);
mutation(tempSecWeak);
population.add(tempWeak);
population.add(tempSecWeak);
for(int j=0; j<tempWeak.length;j++)
{
System.out.print(tempWeak[j] + ", ");
}
for(int j=0; j<tempWeak.length;j++)
{
System.out.print(tempSecWeak[j] + ", ");
}
}
Mutation Function:
public void mutation(int[] mutate)
{
if(this.blankData.size() > 2)
{
Blank blank = this.blankData.get(0);
int x = blank.getPosition();
Blank blank2 = this.blankData.get(1);
int y = blank2.getPosition();
Blank blank3 = this.blankData.get(2);
int z = blank3.getPosition();
int rando = random.nextInt(4) + 1;
if(rando == 2)
{
int rando2 = random.nextInt(4) + 1;
mutate[x] = rando2;
}
if(rando == 3)
{
int rando2 = random.nextInt(4) + 1;
mutate[y] = rando2;
}
if(rando==4)
{
int rando3 = random.nextInt(4) + 1;
mutate[z] = rando3;
}
}
The reason you see rapid convergence is that your methodology for "mating" is not very good. You are always producing two offspring from "mating" of the top two scoring individuals. Imagine what happens when one of the new offspring is the same as your top individual (by chance, no crossover and no mutation, or at least none that have an effect on the fitness). Once this occurs, the top two individuals are identical which eliminates the effectiveness of crossover.
A more typical approach is to replace EVERY individual on every generation. There are lots of possible variations here, but you might do a random choice of two parents weighted fitness.
Regarding population size: I don't know how hard of a problem sudoku is given your genetic representation and fitness function, but I suggest that you think about millions of individuals, not dozens.
If you are working on really hard problems, genetic algorithms are much more effective when you place your population on a 2-D grid and choosing "parents" for each point in the grid from the nearby individuals. You will get local convergence, but each locality will have converged on different solutions; you get a huge amount of variation produced from the borders between the locally-converged areas of the grid.
Another technique you might think about is running to convergence from random populations many times and store the top individual from each run. After you build up a bunch of different local minima genomes, build a new random population from those top individuals.
I think the Sudoku is a permutation problem. therefore i suggest you to use random permutation numbers for initializing population and use the crossover method which Compatible to permutation problems.
I'm trying to write a time efficient algorithm that can detect a group of overlapping circles and make a single circle in the "middle" of the group that will represent that group. The practical application of this is representing GPS locations over a map, put the conversion in to Cartesian co-ordinates is already handled so that's not relevant, the desired effect is that at different zoom levels clusters of close together points just appear as a single circle (that will have the number of points printed in the centre in the final version)
In this example the circles just have a radius of 15 so the distance calculation (Pythagoras) is not being square rooted and compared to 225 for the collision detection. I was trying anything to shave off time, but the problem is this really needs to happen very quickly becasue it's a user facing bit of code that needs to be snappy and good looking.
I've given this a go and I it works with small data sets pretty well. 2 big problems, it takes too long and it can run out of memory if all the points are on top of one another.
The route I've taken is to calculate distance between each point in a first pass, and then take the shortest distance first and start to combine from there, anything that's been combined becomes ineligible for combination on that pass, and the whole list is passed back around to the distance calculations again until nothing changes.
To be honest I think it needs a radical shift in approach and I think it's a little beyond me. I've re factored my code in to one class for ease of posting and generated random points to give an example.
package mergepoints;
import java.awt.Point;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class Merger {
public static void main(String[] args) {
Merger m = new Merger();
m.subProcess(m.createRandomList());
}
private List<Plottable> createRandomList() {
List<Plottable> points = new ArrayList<>();
for (int i = 0; i < 50000; i++) {
Plottable p = new Plottable();
p.location = new Point((int) Math.floor(Math.random() * 1000),
(int) Math.floor(Math.random() * 1000));
points.add(p);
}
return points;
}
private List<Plottable> subProcess(List<Plottable> visible) {
List<PlottableTuple> tuples = new ArrayList<PlottableTuple>();
// create a tuple to store distance and matching objects together,
for (Plottable p : visible) {
PlottableTuple tuple = new PlottableTuple();
tuple.a = p;
tuples.add(tuple);
}
// work out each Plottable relative distance from
// one another and order them by shortest first.
// We may need to do this multiple times for one set so going in own
// method.
// this is the bit that takes ages
setDistances(tuples);
// Sort so that smallest distances are at the top.
// parse the set and combine any pair less than the smallest distance in
// to a combined pin.
// any plottable thats been combine is no longer eligable for combining
// so ignore on this parse.
List<PlottableTuple> sorted = new ArrayList<>(tuples);
Collections.sort(sorted);
Set<Plottable> done = new HashSet<>();
Set<Plottable> mergedSet = new HashSet<>();
for (PlottableTuple pt : sorted) {
if (!done.contains(pt.a) && pt.distance <= 225) {
Plottable merged = combine(pt, done);
done.add(pt.a);
for (PlottableTuple tup : pt.others) {
done.add(tup.a);
}
mergedSet.add(merged);
}
}
// if we haven't processed anything we are done just return visible
// list.
if (done.size() == 0) {
return visible;
} else {
// change the list to represent the new combined plottables and
// repeat the process.
visible.removeAll(done);
visible.addAll(mergedSet);
return subProcess(visible);
}
}
private Plottable combine(PlottableTuple pt, Set<Plottable> done) {
List<Plottable> plottables = new ArrayList<>();
plottables.addAll(pt.a.containingPlottables);
for (PlottableTuple otherTuple : pt.others) {
if (!done.contains(otherTuple.a)) {
plottables.addAll(otherTuple.a.containingPlottables);
}
}
int x = 0;
int y = 0;
for (Plottable p : plottables) {
Point position = p.location;
x += position.x;
y += position.y;
}
x = x / plottables.size();
y = y / plottables.size();
Plottable merged = new Plottable();
merged.containingPlottables.addAll(plottables);
merged.location = new Point(x, y);
return merged;
}
private void setDistances(List<PlottableTuple> tuples) {
System.out.println("pins: " + tuples.size());
int loops = 0;
// Start from the first item and loop through, then repeat but starting
// with the next item.
for (int startIndex = 0; startIndex < tuples.size() - 1; startIndex++) {
// Get the data for the start Plottable
PlottableTuple startTuple = tuples.get(startIndex);
Point startLocation = startTuple.a.location;
for (int i = startIndex + 1; i < tuples.size(); i++) {
loops++;
PlottableTuple compareTuple = tuples.get(i);
double distance = distance(startLocation, compareTuple.a.location);
setDistance(startTuple, compareTuple, distance);
setDistance(compareTuple, startTuple, distance);
}
}
System.out.println("loops " + loops);
}
private void setDistance(PlottableTuple from, PlottableTuple to,
double distance) {
if (distance < from.distance || from.others == null) {
from.distance = distance;
from.others = new HashSet<>();
from.others.add(to);
} else if (distance == from.distance) {
from.others.add(to);
}
}
private double distance(Point a, Point b) {
if (a.equals(b)) {
return 0.0;
}
double result = (((double) a.x - (double) b.x) * ((double) a.x - (double) b.x))
+ (((double) a.y - (double) b.y) * ((double) a.y - (double) b.y));
return result;
}
class PlottableTuple implements Comparable<PlottableTuple> {
public Plottable a;
public Set<PlottableTuple> others;
public double distance;
#Override
public int compareTo(PlottableTuple other) {
return (new Double(distance)).compareTo(other.distance);
}
}
class Plottable {
public Point location;
private Set<Plottable> containingPlottables;
public Plottable(Set<Plottable> plots) {
this.containingPlottables = plots;
}
public Plottable() {
this.containingPlottables = new HashSet<>();
this.containingPlottables.add(this);
}
public Set<Plottable> getContainingPlottables() {
return containingPlottables;
}
}
}
Map all your circles on a 2D grid first. You then only need to compare the circles in a cell with the other circles in that cell and in it's 9 neighbors (you can reduce that to five by using a brick pattern instead of a regular grid).
If you only need to be really approximate, then you can just group all the circles that fall into a cell together. You will probably also want to merge cells that only have a small number of circles together with there neighbors, but this will be fast.
This problem is going to take a reasonable amount of computation no matter how you do it, the question then is: can you do all the computation up-front so that at run-time it's just doing a look-up? I would build a tree-like structure where each layer is all the points that need to be drawn for a given zoom level. It takes more computation up-front, but at run-time you are simply drawing a list of point, fast.
My idea is to decide what the resolution of each zoom level is (ie at zoom level 1 points closer than 15 get merged; at zoom level 2 points closer than 30 get merged), then go through your points making groups of points that are within the 15 of each other and pick a point to represent group that group at the higher zoom. Now you have a 2 layer tree. Then you pass over the second layer grouping all points that are within 30 of each other, and so on all the way up to your highest zoom level. Now save this tree structure to file, and at run-time you can very quickly change zoom levels by simply drawing all points at the appropriate tree level. If you need to add or remove points, that can be done dynamically by figuring out where to attach them to the tree.
There are two downsides to this method that come to mind: 1) it will take a long time to compute the tree, but you only have to do this once, and 2) you'll have to think really carefully about how you build the tree, based on how you want the groupings to be done at higher levels. For example, in the image below the top level may not be the right grouping that you want. Maybe instead building the tree based off the previous layer, you always want to go back to the original points. That said, some loss of precision always happens when you're trying to trade-off for faster run-time.
EDIT
So you have a problem which requires O(n^2) comparisons, you say it has to be done in real-time, can not be pre-computed, and has to be fast. Good luck with that.
Let's analyze the problem a bit; if you do no pre-computation then in order to decide which points can be merged you have to compare every pair of points, that's O(n^2) comparisons. I suggested building a tree before-hand, O(n^2 log n) once, but then runtime is just a lookup, O(1). You could also do something in between where you do some work before and some at run-time, but that's how these problems always go, you have to do a certain amount of computation, you can play games by doing some of it earlier, but at the end of the day you still have to do the computation.
For example, if you're willing to do some pre-computation, you could try keeping two copies of the list of points, one sorted by x-value and one sorted by y-value, then instead of comparing every pair of points, you can do 4 binary searches to find all the points within, say, a 30 unit box of the current point. More complicated so would be slower for a small number of points (say <100), but would reduce the overall complexity to O(n log n), making it faster for large amounts of data.
EDIT 2
If you're worried about multiple points at the same location, then why don't you do a first pass removing the redundant points, then you'll have a smaller "search list"
list searchList = new list()
for pt1 in points :
boolean clean = true
for pt2 in searchList :
if distance(pt1, pt2) < epsilon :
clean = false
break
if clean :
searchList.add(pt1)
// Now you have a smaller list to act on with only 1 point per cluster
// ... I guess this is actually the same as my first suggestion if you make one of these search lists per zoom level. huh.
EDIT 3: Graph Traversal
A totally new approach would be to build a graph out of the points and do some sort of longest-edge-first graph traversal on them. So pick a point, draw it, and traverse its longest edge, draw that point, etc. Repeat this until you come to a point which doesn't have any untraversed edges longer than your zoom resolution. The number of edges per point gives you an easy way to tradeoff speed for correctness. If the number of edges per point was small and constant, say 4, then with a bit of cleverness you could build the graph in O(n) time and also traverse it to draw points in O(n) time. Fast enough to do it on the fly with no pre-computation.
Just a wild guess and something that occurred to me while reading responses from others.
Do a multi-step comparison. Assume your combining distance at the current zoom level is 20 meters. First, subtract (X1 - X2). If This is bigger than 20 meters then you are done, the points are too far. Next, subtract (Y1 - Y2) and do the same thing to reject combining the points.
You could stop here and be happy if you are good with using only horizontal/vertical distances as your metric for combining. Much less math (no squaring or square roots). Pythagoras wouldn't be happy but your users might.
If you really insist on exact answers, do the two subtraction/comparison steps above. If the points are within horizontal and vertical limits, THEN you do the full Pythagoras check with square roots.
Assuming all your points are not highly clustered very close to the combining limit, this should save some CPU cycles.
This is still approximately an O(n^2) technique, but the math should be simpler. If you have the memory, you could store distances between each set of points and then you never have to compute it again. This could take up more memory than you have and also grows at a rate of approximately O(n^2), so be careful.
Also, you could make a linked list or sorted array of all your points, sorted in order of increasing X or increasing Y. (I don't think you need both, just one). Then walk through the list in sorted order. For each point, check the neighbors out until (X1 - X2) is bigger than your combining distance. and then stop. You don't have to compare each set of points for O(N^2), you only have to compare neighbors that are close in one dimension to quickly prune your large list to a small one. As you move through the list, you only have to compare points that have a bigger X than your current candidate, because you already compared and combined with all previous X values. This gets you closer to the O(n) complexity you want. Of course, you would need to check the Y dimension and fully qualify the points to be combined before you actually do it. Don't just use the X distance to make your combining decision.