I want to find the minimum distance between two polygons with million number of vertices(not the minimum distance between their vertices). I have to find the minimum of shortest distance between each vertex of first shape with all of the vertices of the other one. Something like the Hausdorff Distance, but I need the minimum instead of the maximum.
Perhaps you should check out (PDF warning! Also note that, for some reason, the order of the pages is reversed) "Optimal Algorithms for Computing the Minimum Distance Between Two Finite Planar Sets" by Toussaint and Bhattacharya:
It is shown in this paper that the
minimum distance between two finite
planar sets if [sic] n points can be
computed in O(n log n) worst-case
running time and that this is optimal
to within a constant factor.
Furthermore, when the sets form a
convex polygon this complexity can be
reduced to O(n).
If the two polygons are crossing convex ones, perhaps you should also check out (PDF warning! Again, the order of the pages is reversed) "An Optimal Algorithm for Computing the Minimum Vertex Distance Between Two Crossing Convex Polygons" by Toussaint:
Let P = {p1,
p2,..., pm} and Q = {q1, q2,...,
qn} be two intersecting polygons whose vertices are specified
by their cartesian coordinates in
order. An optimal O(m + n)
algorithm is presented for computing
the minimum euclidean distance between
a vertex pi in P and a
vertex qj in Q.
There is a simple algorithm that uses Minkowski Addition that allows calculating min-distance of two convex polygonal shapes and runs in O(n + m).
Links:
algoWiki, boost.org, neerc.ifmo.ru (in russian).
If Minkowski subtraction of two convex polygons covers (0, 0), then they intersect
Related
I have n geometric shapes defined in GeoJson, I would like to calculate the intersection which involves maximum number of shapes.
I have the following constraints;
none of the shapes may intersects (no intersection between any of shapes, 0-participant intersection)
all shapes may intersects (there is an intersection that is in all shapes, n-participant intersection)
there might be more than one intersection with k participant (shape A B C intersects, shape D E F intersects, there are 2 3-participant intersection, doesn't need to find both, return when first 3-participant found)
So, as a starting point, I though I could do that by using brute force (trying to intersects n, n-1, n-2 combination of given shapes) but it's time complexity will be O(n!). I'm wondering that if the algorithm could be optimized?
EDIT:
Well, I forgot the tell about data types. I'm using Esri/geometry library for shapes. Specifically, Polygon class instances.
This problem feels like you can construct hard cases that cannot be solved efficiently, specifically if the shapes are not convex. Here are two ideas that you could try:
1. Iterative Intersection
Keep a list L of (disjoint) polygons with counts that is empty in the beginning. Now iterate through your given polygons P. For each polygon p from P intersect it with all polygons l from L. If there is an intersection between p and l then remove l from L and
add set_intersection(l, p) with previous count of l +1
add set_minus(l, p) with `previous count of l'
remember set_minus(p, l) and proceed to the next entry of L
when you are through all elements of L then add the remaining part of p to L with count 1.
Eventually you will have a list of disjoint polygons with counts that is equvalent to the number of participant polygons.
2. Space Decomposition
Build a bounding box around all polygons. Then iteratively split that space (similar to a KD-Tree). For each half (rectangle), compute the number of polygons from P intersecting that rectangle. Proceed best-first (always evaluate the rectangle that has the highest count). When you are at a certain level of the KD-Tree then stop and evaluate by brute-force or Iterative Intersection.
Both methods will benefit from a filter using minimum bounding rectangles around the polygons.
A list of unprocessed (non-empty) intersections together with the polygon indices from which the intersection was created is maintained. Initially it is filled with the single polygons. Always the intersection from most polygons and least maximal index of involved polygons is taken out and intersected with all polygons with higher indices (to avoid looking at the same subset of polygons again and again). This allows an efficient pruning:
if the number of polygons involved in the currently processed intersection and the remaining indices do not surpass the highest number of polygons to have a non-empty intersection, we do not need to pursue this intersection any longer.
Here is the algorithm in Python-like notation:
def findIntersectionFromMostMembers(polygons):
unprocessedIntersections = [(polygon, {i}) for i, polygon in enumerate(polygons)]
unprocessedIntersections.reverse() # make polygon_0 the last element
intersectionFromMostMembers, indicesMost = (polygons[0], {0})
while len(unprocessedIntersections) > 0:
# last element has most involved polygons and least maximal polygon index
# -> highest chance of being extended to best intersection
intersection, indices = unprocessedIntersections.pop() # take out unprocessed intersection from most polygons
if len(indices) + n - max(indices) - 1 <= len(indicesMost):
continue # pruning 1: this intersection cannot beat best found solution so far
for i in range(max(indices)+1, n): # pruning 2: only look at polyong with higher indices
intersection1 = intersection.intersect(polygons[i])
if not intersection1.isEmpty(): # pruning 3: empty intersections do not need to be considered any further
unprocessedIntersections.insertSorted(intersection1, indices.union({i}), key = lambda(t: len(t[1]), -max(t[1])))
if len(indices)+1 > len(indicesMost):
intersectionFromMostMembers, indicesMost = (intersection1, indices.union({i}))
return intersectionFromMostMembers, indicesMost
The performance highly depends on how many polygons in average do have an area in common. The fewer (<< n) polygons have areas in common, the more effective pruning 3 is. The more polygons have areas in common, the more effective pruning 1 is. Pruning 2 makes sure that no subset of polygons is considered twice. The worst scenario seems to be when a constant fraction of n (e.g. n/2) polygons have some area in common. Up to n=40 this algorithm terminates in reasonable time (in a few seconds or at most in a few minutes). If the non-empty intersection from most polygons involves only a few (any constant << n) polygons, much bigger sets of polygons can be processed in reasonable time.
I'm working on the k-means clustering with Java. I don't see problem in my code and it looks well. However, I don't understand something.
Step 1:
Choose N number of centers. (Let there is N number of clusters)
Step 2:
Put each vector into cluster with nearest center using Euclidean distance. (||v1 - v2||)
Step 3:
Find new mean (=center) for each cluster
Step 4:
If the center have moved significantly, go to step 2
However, when I make a plot of total of point-to-respective-center distances after each iteration, I can see that the total is decreasing all the time (although it's decreasing in general and converging well).
total distance of 2nd iteration is always shorter than first one, and is the shortest. And the total distance is slightly increasing at the 3rd iteration and converges at 4 or 5th iteration.
I believe I was told to be it should be always decreasing. What's wrong? My algorithm (implementation) or my assumption about the total distance?
It must always be decreasing for the same seed.
Maybe your error is that you use Euclidean distance.
K-means does not minimize Euclidean distances.
This is a common misconception that even half of the professors get wrong. K-means minimizes the sum-of-squares, i.e., the sum of squared Euclidean distances. And no, this does not find the solution with the smallest Euclidean distances.
So make sure you are plotting SSQ everywhere. Remove all square roots from your code. They do not belong into k-means.
Additional comments:
Don't minimize variances (or equivalently, standard deviations), as tempting as it might be:
Minimizing sum of squared distances is not equivalent to minimizing variances, but that hasn't stopped people from suggesting it as the proper objective for k-means.
It is easy to imagine why this could be a bad idea:
Imagine a single point that is almost mid-way (Euclidean) between two cluster centroids, both with the same variance before including the new point. Now imagine one of the clusters has a much larger membership of points than the other cluster. Let's say the new point is slightly closer to the one with the much larger membership. Adding the new point to the larger cluster, though correct because it is closer to that centroid, won't decrease its variance nearly as much as adding the new point to the other cluster with the much smaller membership.
If you are minimizing the proper objective function, but it still isn't decreasing monotonically, check that you aren't quantizing your centroid means:
This would happen, for example, if you are performing image segmentation with integer values that range in [0, 255] rather than float values in [0, 1], and you are forcing the centroid means to be uint8 datatypes.
Whenever the centroid means are found, they should then be used in the objective function as-is. If your algorithm is finding one value for centroid means (floats), but is then minimizing the objective with other values (byte ints), this could lead to unacceptable variations from the supposed monotonically decreasing objective.
I would like to use Simulated Annealing to find local minimum of single variable Polynomial function, within some predefined interval. I would also like to try and find Global minimum of Quadratic function.
Derivative-free algorithm such as this is not the best way to tackle the problem, so this is only for study purposes.
While the algorithm itself is pretty straight-forward, i am not sure how to efficiently select neighbor in single or n-dimensional space.
Lets say that i am looking for local minimum of function: 2*x^3+x+1 over interval [-0.5, 30], and assume that interval is reduced to tenths of each number, e.g {1.1, 1.2 ,1.3 , ..., 29.9, 30}.
What i would like to achieve is balance between random walk and speed of convergence from starting point to points with lower energy.
If i simply select random number form the given interval every time, then there is no random walk and the algorithm might circle around. If, on the contrary, next point is selected by simply adding or subtracting 0.1 with the equal probability, then the algorithm might turn into exhaustive search - based on the starting point.
How should i efficiently balance Simulated Annealing neighbor search in single dimensional and n-dimensional space ?
So you are trying to find an n-dimensional point P' that is "randomly" near another n-dimensional point P; for example, at distance T. (Since this is simulated annealing, I assume that you will be decrementing T once in a while).
This could work:
double[] displacement(double t, int dimension, Random r) {
double[] d = new double[dimension];
for (int i=0; i<dimension; i++) d[i] = r.nextGaussian()*t;
return d;
}
The output is randomly distributed in all directions and centred on the origin (notice that r.nextDouble() would favour 45º angles and be centred at 0.5). You can vary the displacement by increasing t as needed; 95% of results will be within 2*t of the origin.
EDIT:
To generate a displaced point near a given one, you could modify it as
double[] displaced(double t, double[] p, Random r) {
double[] d = new double[p.length];
for (int i=0; i<p.length; i++) d[i] = p[i] + r.nextGaussian()*t;
return d;
}
You should use the same r for all calls (because if you create a new Random() for each you will keep getting the same displacements over and over).
In "Numerical Recepies in C++" there is a chapter titled "Continuous Minimization by Simulated Annealing". In it we have
A generator of random changes is inefficient if, when local downhill moves exist, it nevertheless almost always proposes an uphill move. A good generator, we think, should not become inefficient in narrow valleys; nor should it become more and more inefficient as convergence to a minimum is approached.
They then proceed to discuss a "downhill simplex method".
I'm trying to write a 2D game in Java that uses the Separating Axis Theorem for collision detection. In order to resolve collisions between two polygons, I need to know the Minimum Translation Vector of the collision, and I need to know which direction it points relative to the polygons (so that I can give one polygon a penalty force along that direction and the other a penalty force in the opposite direction). For reference, I'm trying to implement the algorithm here.
I'd like to guarantee that if I call my collision detection function collide(Polygon polygon1, Polygon polygon2) and it detects a collision, the returned MTV will always point away from polygon1, toward polygon2. In order to do this, I need to guarantee that the separating axes that I generate, which are the normals of the polygon edges, always point away from the polygon that generated them. (That way, I know to negate any axis from polygon2 before using it as the MTV).
Unfortunately, it seems that whether or not the normal I generate for a polygon edge points towards the interior of the polygon or the exterior depends on whether the polygon's points are declared in clockwise or counterclockwise order. I'm using the algorithm described here to generate normals, and assuming that I pick (x, y) => (y, -x) for the "perpendicular" method, the resulting normals will only point away from the polygon if I iterate over the vertices in clockwise order.
Given that I can't force the client to declare the points of the polygon in clockwise order (I'm using java.awt.Polygon, which just exposes two arrays for x and y coordinates), is there a mathematical way to guarantee that the direction of the normal vectors I generate is toward the exterior of the polygon? I'm not very good at vector math, so there may be an obvious solution to this that I'm missing. Most Internet resources about the SAT just assume that you can always iterate over the vertices of a polygon in clockwise order.
You can just calculate which direction each polygon is ordered, using, for example, the answer to this question, and then multiply your normal by -1 if the two polygons have different orders.
You could also check each polygon passed to your algorithm to see if it is ordered incorrectly, again using the algorithm above, and reverse the vertex order if necessary.
Note that when calculating the vertex order, some algorithms will work for all polygons and some just for convex polygons.
I finally figured it out, but the one answer posted was not the complete solution so I'm not going to accept it. I was able to determine the ordering of the polygon using the basic algorithm described in this SO answer (also described less clearly in David Norman's link), which is:
for each edge in polygon:
sum += (x2 - x1) * (y2 + y1)
However, there's an important caveat which none of these answers mention. Normally, you can decide that the polygon's vertices are clockwise if this sum is positive, and counterclockwise if the sum is negative. But the comparison is inverted in Java's 2D graphics system, and in fact in many graphics systems, because the positive y axis points downward. So in a normal, mathematical coordinate system, you can say
if sum > 0 then polygon is clockwise
but in a graphics coordinate system with an inverted y-axis, it's actually
if sum < 0 then polygon is clockwise
My actual code, using Java's Polygon, looked something like this:
//First, find the normals as if the polygon was clockwise
int sum = 0;
for(int i = 0; i < polygon.npoints; i++) {
int nextI = (i + 1 == polygon.npoints ? 0 : i + 1);
sum += (polygon.xpoints[nextI] - polygon.xpoints[i]) *
(polygon.ypoints[nextI] + polygon.ypoints[i]);
}
if(sum > 0) {
//reverse all the normals (multiply them by -1)
}
I have a small contest problem in which is given a set of points, in 2D, that form a triangle. This triangle may be subject to an arbitrary rotation, may be subject to an arbitrary translation (both in the 2D plane) and may be subject to a reflection on a mirror, but its dimensions were kept unchanged.
Then, they give me a set of points in the plane, and I have to find 3 points that form my triangle after one or more of those geometric operations.
Example:
5 15
8 5
20 10
6
5 17
5 20
20 5
10 5
15 20
15 10
Output:
5 17
10 5
15 20
I bet it's supposed to apply some known algorithm, but I don't know which. The most common are: convex hull, sweep plane, triangulation, etc.
Can someone give a tip? I don't need the code, only a push, please!
A triangle is uniquely defined (ignoring rotations, flips, and translations) by the lengths of its three sides. Label the vertices of your original triangle A,B,C. You're looking
for points D,E,F such that |AB| = |DE|, |AC| = |DF|, and |BC| = |EF|. The length is given by Pythagoras' formula (but you can save a square root operation at each test by comparing
the squares of the line segment lengths...)
The given triangle is defined by three lengths. You want to find three points in the list separated by exactly those lengths.
Square the given lengths to avoid bothering with sqrt.
Find the square of the distance between every pair of points in the list and only note those that coincide with the given lengths: O(V^2), but with a low coefficient because most lengths will not match.
Now you have a sparse graph with O(V) edges. Find every cycle of size 3 in O(V) time, and prune the matches. (Not sure of the best way, but here is one way with proper big-O.)
Total complexity: O(V^2) but finding the cycles in O(V) may be the limiting factor, depending on the number of points. Spatially sorting the list of points to avoid looking at all pairs should improve the asymptotic behavior, otherwise.
This is generally done with matrix math. This Wikipedia article covers the rotation, translation, and reflection matrices. Here's another site (with pictures).
Since the transformations are just rotation, scaling and mirroring, then you can find the points that form the transformed triangle by checking the dot product of two sides of the triangle:
For the original triangle A, B, C, calculate the dot product of AB.AC, BA.BC and CA.CB
For each set of three points D, E, F, calculate dot product of DE.DF and compare against the three dot products found in 1.
This works since AB.AC = |AB| x |AC| x cos(a), and two lengths and an angle between them defines a triangle.
EDIT: Yes, Jim is right, just one dot product isn't enough, you'll need to do all three, including ED.EF and FD.FE. So in the end, there's the same number of calculations in this as there is in the square distances method.