I have two ArrayList, Double data type,
1.latitudes
2. longitudes,
each has over 200 elements
say i give a random test coordinates, say (1.33, 103.4), the format is [latitude, longitude]
is there any algorithm to easily find closest point,
or do i have to brute force calculate every possible point, find hypotenuse, and then compare over 200 hypotenuses to return the closest point? thanks
Sort the array of points along one axis. Then, locate the point in the array closest to the required point along this axis and calculate the distance (using whatever metric is appropriate to the problem topology and scale).
Then, search along the array in both directions until the distance to these points is greater than the best result so far. The shortest distance point is the answer.
This can result in having to search the entire array, and is a form of Branch and bound constrained by the geometry of the problem. If the points are reasonably evenly distributed around the point you are searching for, then the scan will not require many trials.
Alternate spatial indices (like quad-trees) will give better results, but your small number of points would make the setup cost in preparing the index much larger than a simple sort. You will need to track the position changes caused by the sort as your other array will not be sorted the same way. If you change the data into a single array of points, then the sort will reorder entire points at the same time.
If your arrays are sorted, you can use binary search to find a position of a requested point in array. After you find index, you should check four near by points to find the closest.
1)Suppose you have two sorted arrays longitudes-wise and latitudes-wise
2)You search first one and find two nearby points
3)Then you search second one and find two more points
4)Now you have from two to four points(results might intersect)
5)These points will form a square around destination point
6)Find the closest point
it's not true that closest lat (or long) value should be choosen to search over the long (or lat) axis, in fact you could stay on a lat (or long) line but far away along the long (or lat) value
so best way is to calculate all distances and sort them
Related
I've done a fair bit of reading around this, and know that discussions regarding this algorithm in Java have been semi-frequent. My issue with implementing Dijkstra's algorithm in Java is simply that I'm not sure how to prepare my data.
I have a set of coordinates within an array, and a set of 1s and 0s in a matrix that represent whether there is a path between the points that the coordinates represent. My question is, how do I present this information so that I can search for the best path with Dijkstra? I have seen many people create a "Node" class, but they never seem to store the coordinates within that Node. Is there some standardized way of creating this kind of structure (I suppose it's a graph?) that I am simply missing?
Any help would be appreciated.
There are two main options:
1. You can use an adjacency matrix in which rows an columns represent your nodes. The value matrix[x, y] must be the weight(e.g. distance/cost etc.) to travel from x to y. You could use the Euclidian distance to calculate these values from your coordinate array;
2. You can implement a couple of classes (Node, Edge - or just Node with a internal Map to another node and the weight as a map value) - it is a graph indeed.
I have a curve (say JTS edge):
How to find all curve direction change points that surpasses given angle using JTS (Java) or NTS (C#):
I did some research and made some tests on JTS, and the best way I found is:
Create polygons and use the function union
Then iterate over the Coordinates, and create a sub-array on each "hard angle" (negative scalar product) and when the sum of angle reaches 180 (don't take the last angle to avoid non-function issues)
Then I change the base to an orthonormal base with x(firstElemOfSubArray, lastElemOfSubArray) by computing the base-changing matrix, and I then recompute the sub-array in a new coordinate system
I then create a function using org.apache.commons.math3.analysis.interpolation.SplineInterpolator to interpolate the function of the course, and then I get the derivative and search the extrema (don't take elements with an ordinate that is too low). With its absysse you can find which point is an inflexion point
So the point you search for are first elements of each sub array, and its inflections points (if there are any)
I had an interview question as below:
Suppose we have a line and M points in this line. If we define the distance of a subset of points which has N (N <= M) points to be the minimum distance of the distance between of each pair of point, write a algorithm to find the maximum distance of all subsets, each one has N points...
By this I mean, if we have an array {1,2,10}, and N=2, then the subset with the maximum distance should be {1,10}. My first thought was to get all the combinations of subset and calculate the distance of each one, but the interviewer didn't like it because it would take too much time. Does anyone have a time efficient idea?
You need to sort the array, and find the 1st element(Smallest) and M will always be the last element (largest), So subset will always be the {smallest,Largest}.
Having things arranged along a line is often a tip-off that dynamic programming will work.
Work from left to right. At each point in the line work out, for k = 1..N the set of points of size k amongst those seen so far that has the largest minimum distance. You can work out the the answers for one point in the line from the answers you have already worked out for points to its left. To find the answer for k points, consider each point to its left and find min(minimum distance for k-1 points at that point, distance from current point to that point). Then take the maximum of these possible values.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am trying to compute the best latitude longitude pairs for several locations.
I have a database with locations and for each location I may have multiple coordinates. Most of these coordinates seem relevant for the location as they are located within 5 meters from each other.
So I can derive a new (final) latitude longitude pair by averaging them.
Sometimes however I have a point (sometimes more then one) that is located several hundred meters away.
Given a set of few (maximum 10) latitude longitude points, I would like to find and keep only those points that make sense and discard those who are too far away from others.
What approach / algorithm should I use ?
Note I work with Java.
Simple approach:
Compute the distance of all points to some arbitrary point.
Find the median distance of all points.
Discard all points whose abs (dist - median) > value.
A bit better than the centroid approach which could get screwed by few far away points that are clustered together.
The simplest approach is likely to be:
Find the centroid (average long/lat) point for a given set of points
Compute the distance from each point in the set to the centroid. Discard all points with a distance over a certain constant value (calling these points noise)
Recompute the centroid from the remaining non-noise points, call that the location.
This should be pretty simple to implement in java and certainly can be O(N), N being the number of points in your set.
Your problem is a specific case of K-means clustering, in that you know which real-world data correspond to which samples whereas in the general case you don't have that knowledge. So look into that problem and assorted approaches if you want more research.
There are a couple of questions you need to ask yourself:
Which point should be treated as "not making sense" if you have only two points being 100 meters away.
Which point should be treated as "not making sense" if you have two separate clusters of points?
What should you do if you have a continuous row of points that still fit within the margin of error counting to the closest neighbour, but in total span over the limit?
The question you've asked is hard to answer without clear criteria, although I'd try looking through clustering algorithms.
If we would skip problems I've mentioned, I'd say that it's computationally heavy, but you can go by
calculating the distances between all points in given set
sorting them by the sum of distances
filtering out the one with highest sum
Iterating over until there are no points for which the sum of distances is greater than errorMargin * N-1 where N is the current number of points.
Still you need to take the border cases into consideration, cause for instance problem mentioned in 1) would leave you with a single random point - I doubt you're ok with that, so you need to carefully analyse your domain.
If you are using Java8 then the following code provides an elegant solution.
Collector<Location, ?, Location> centreCollector = new CentreCollector();
Location centre = locations.stream().collect(centreCollector);
centre = locations.stream().filter(centre::furtherThan(NOISE_DISTANCE)).collect(centreCollector);
You have 2 things to create. The CentreCollector class which implements Collector and averages Location objects as they are streamed to it; and the furtherThan method which returns a Predicate that compares the distance between this and a given location to a given distance.
A slightly more elegant method would be to calculate the standard deviation of the distances to the centre and then discard any locations that are more than a certain number of standard deviations from the average distance. This would have the advantage of taking account of sets of locations in which all or most of the samples are more than the NOISE_DISTANCE from the centre. In that case the CentreCollector will have to return a more complex object that holds the location and statistical information and have furtherThan as a member of that class rather than of Location. Let me know in the comments if you want me to post the equivalent code for using standard deviations.
I have one 2d line (it can be a curved line, with loops and so on), and multiple similar paths. I want to compare the first path with the rest, and determine which one is the most similar (in percentage if possible).
I was thinking maybe transforming the paths into bitmaps and then using a library to compare the bitmaps, but that seems like overkill. In my case, I have only an uninterrupted path, made of points, and no different colors or anything.
Can anyone help me?
Edit:
So the first line is the black one. I compare all other lines to it. I want a library or algorithm that can say: the red line is 90% accurate (because it has almost the same shape, and is close to the black one); the blue line is 5% accurate - this percentage is made up for this example... - because it has a similar shape, but it's smaller and not close to the black path.
So the criterion of similarity would be:
how close the lines are one to another
what shape do they have
how big they are
(color doesn't matter)
I know it's impossible to find a library that considers all this. But the most important comparisons should be: are they the same shape and size? The distance I can calculate on my own.
I can think of two measures to express similarity between two lines N (defined as straight line segments between points p0, p1... pr)
M (with straight line segments between q0, q1, ...qs). I assume that p0 and q0 are always closer than p0 and qs.
1) Area
Use the sum of the areas enclosed between N and M, where N and M are more different as the area gets larger.
To get N and M to form a closed shape you should connect p0 and q0 and pr and qs with straight line segments.
To be able to calculate the surface of the enclosed areas, introduce new points at the intersections between segments of N and M, so that you get one or more simple polygons without holes or self-intersections. The area of such a polygon is relatively straightforward to compute (search for "polygon area calculation" around on the web), sum the areas and you have your measure of (dis)similarity.
2) Sampling
Take a predefined number (say, 1000) of sample points O that lie on N (either evenly spaced with respect to the entire line, or evenly spaced
over each line segment of N). For each sample point o in O, we'll calculate the distance to the closest corresponding point on M: the result is the sum of these distances.
Next, reverse the roles: take the sample points from M and calculate each closest corresponding point on N, and sum their distances.
Whichever of these two produces the smallest sum (they're likely not the same!) is the measure of (dis)similarity.
Note: To locate the closest corresponding point on M, locate the closest point for every straight line segment in M (which is simple algebra, google for "shortest distance between a point and a straight line segment"). Use the result from the segment that has the smallest distance to o.
Comparison
Method 1 requires several geometric primitives (point, line segment, polygon) and operations on them (such as calculating intersection points and polygon areas),
in order to implement. This is more work, but produces a more robust result and is easier to optimize for lines consisting of lots of line segments.
Method 2 requires picking a "correct" number of sample points, which can be hard if the lines have alternating parts with little detail
and parts with lots of detail (i.e. a lot of line segments close together), and its implementation is likely to quickly get (very) slow
with a large number of sample points (matching every sample point against every line segment is a quadratic operation).
On the upside, it doesn't require a lot of geometric operations and is relatively easy to implement.