What is a good data structure for a multi resolution graph? - java

I have a data set consisting of hundreds of millions of data points. I'd like to be able to effectively render such a set depending on the zoom level (i.e. axis scale). I'd like to be able to have a sampled subset render at the full view. As you zoom in, you'll be able to see more detailed data points until you reach maximum zoom, at that point you'll be able to see individual data points. What would be a good data structure to store such a data set and allows multi resolution access?

You need to keep your points spatially indexed, because "outlier" and "density" are spatial properties -- an outlier is a point that happens to be in a low-density area; and "zooming out" would mean replacing sets of close-together points for 'sampled' points; and when "zooming in" you really, really want to ignore all those points that fall outside the current window. Your operations could be something like:
void addPoint(Point2D p);
void removePoint(Point2D p);
Iterator<Point2D> getPointsToPaint(Rectangle2D viewArea, int maxDensity, double densityArea);
where the viewArea represents the window you want to find points for, and the maxDensity parameter could be used to control point abstraction: when more than maxDensity points fall within a densityArea square, you return maxDensity random points within that area instead. getPointsToPaint would then cover your viewArea with densityArea sampling boxes and return the points within: the real points if less than maxDensity, and the "sampled" ones if over maxDensity (nobody will notice if 10 points within a 1mm2 area are random or not).
Typical spatial structures are quad-trees (for 2d) and kd-trees (for any number of ds). However, in their default implementations, neither of them is too good for quickly-changing dynamic data. Another option is to use spatial hashing; but you really seem to need a multi-level approach, and for multi-level, trees are always the way to go. From a quick review of search results for "dynamic spatial indexing", it seems that a variant of the r-tree may be what you are looking for. Beware that these data-structures are not easy to implement from scratch. The best approach may be to rely on an external GIS system to do the bookkeeping for you. Several Java GISs are available.

Not 100% sure what kind of data you are rendering, but I guess you could do sampling and calculate an approximation, and as you zoom in you make the approximation more and more accurate?

Related

Convert lat/long to US State

I have access to a list of lat/long coordinates, and I want to know (roughly) the US State these coordinates are located in. I can do with loss of precision, but I can't rely on external libraries or API. I can also add a database of locations in my code.
What is a reasonable way to do this?
I thought about 3 possibilities:
Represent each state by a single point at its center, then do a nearest-neighbour search
Represent each state by points located at cities in the state, then do a nearest-neighbour search (with much more points)
Represent each state by a simple bounding box, then use some algorithm to query which bounding box my point belongs to
What do you think is best? I would tend to think about solution 3, but I can't find a list of coarse "bounding boxes" for US states
I made a little search and find out a proper solution for what you are looking for with a dataset of bounding box.
Answer on StackOverflow: LINK
Dataset: LINK
Algorithm to use(implement): LINK
So yes, the proper way to implement it's using the solution 3 with the given dataset.
Hope it helps :)
Will not work, consider
Has a high likelihood to not work for at least some states. Consider states with towns/cities more clustered to the middle, against states with towns/cities clustered to the edge.
Will not work (these were supposed to be 90 degree angles, perfect squares, but drawing with a mouse is hard :) )
If you want to do this even vaguely accurately you will need some shape data which defines the boundaries between states. You will then need an algorithm which can determine whether a point is within an irregular polygon
See List of the United States (US) state boundaries / borders as latitude/longitude pairs for geofence?

How can I correctly convert geographical coordinates to pixels on screen?

I'm trying to make a Java project that pinpoints the place on a image of a map, when given coordinates (taken from Google Maps).
I've tried using the top-left corner of the image (place that has highest latitude, and the lowest longitude), as an some kind of an reference point, which would be (0,0) point on the map image, and than I've tried to calculate every place on the map based on that reference point. However, this method proved inaccurate, probably because of the curvature of the Earth (mind that the map I'm working with (Serbia) covers area of 4° latitude, and 4° longitude).
I've seen couple of answers talking about converting into Mercator projection, but they are not very clear to me, because they are not covering a case similar to mine, and are not written in Java.
What can I do to pinpoint those points more accurately (±3km would be accurate enough)?
As comments have pointed oit correctly, in order to precisely convert between geographic coordinates and map position, you have to know the method of projection used for the map, and a sufficient number of parameters so that tuning the remaining parameters using a suitable set of reference points becomes feasible.
So if you assume a conic projection, then read the document David pointed out, and this referenced follow-up as well. As you can see, within the family of conic projections, there are a few alternatives to choose from. Each of them is described by a few parameters (i.e. standard parallels, cone constant, aspect ratio, …). You'd make guesses for these and then use some numerical optimization to obtain a best fit. Then you take the best parameter fit for each kind of projection and see which of them has the best overall fit. Quite a bit of work. If you don't want to implement the code for all these projections you can use proj.4 either on the command line or as a native library. To do the numeric optimization, you could possibly try to adapt one of the COIN-OR projects to your application.
In any case, the first step would be creating a suitable set of reference points which you can use to evaluate the fit. So pick a few prominent points on your map and find Google Earth coordinates for these. I'd say you should have at least a dozen points, to account for the fact that you know so little about your map. Otherwise there is a great risk that you will tune the large number of parameters to exactly fit your points while the rest of the map is still completely off. Even with this number of reference points, since the area of Serbia is not that big (compared to maps spanning whole continents), the errors of a wrong guess or a bad fit might be very small. So it might be hard to actually decide which projection has been used.
With all that I said above, and even with external libraries taking care of the projection and the numerical optimization, it might easily take you half a year just to set up the tools to work out the projection. So decide whether that's worth the effort. If not, there are several alternatives. One would be to take a different map, one where you know the projection. Or contact the author of your map and obtain the projection. Or ask someone working in geodesics in Serbia, because they might have enough experience to recognize the projection at a glance, I don't know.
One other option is by combining the fact that you need reference points with the fact that you might not be able to work out the exact projection in any case. Simply combine these in the following way: choose a suitably dense set of reference points, evenly distributed over the map. Then interpolate between them, picewise linearily or with higher degree or using some weighted interpolation scheme or whatever. You know there is a projection behind all this, but you give up on working out the projection, and simply mitigate the symptom: by having enough reference points, each data item is close enough to a reference point to keep the error smaller than your threshold.
I found an answer I was looking for in this thread: Java, convert lat/lon to UTM
I find out that the actual projection of my map was UTM. From there, it was simply finding a class that would convert my lat/lon coordinates into UTM eastings and northings (very useful code in this answer), and then I would do simple math to find out where the point is compared to the boundaries of the map, and it's actually working.

How to calculate a timely list of GPS points in a given radius in memory (Java)?

I know this would be quite easy to do with PostGIS in a database, but I only expect my points to live for a couple of minutes max.
I would like to have a list of GPS points that live for 1-5min. When I add a new point, I would like to calculate a list of points from the aged list that are in a 1-10km radius of the new point.
It is still recommended to perform these operations in a database like PostGIS or Mongo?
If not, how would one go about calculating this in memory?
You can compute a quadkey, it's similar to a quadtree and bing maps use it for the tile server. A quadkey can be computed with a morton curve. You can download my php class hilbert-curve # phpclasses.org.
Databases are very slow, because they access slow disk memory.
For your solution, you need an indexing technic that Geo Spatial DBs uses, too, (PostGis or Oracle-Spatial). In your case your index stays in memory:
It is an (geo-) spatial index.
There are quad tree and R-Tree as state of the art. (sometimes a kd-tree also is used)
Quad tree are easier to implement, R-Tree can be twice as fast than quadtrees.
I would use the quad tree.
The most demanding part of your work is the delete operation of an point.
This is tricky and much slower than insert.
One solution, recomended by the original author of the quadtree, and which I would try: Mark the points as deleted, but keep them in the Quad tree.
Evry x minutes rebuild the whole quadtree, by creating a new one from thedata of the old one but by only keeping the points still existing.
Once the points are in the quad tree, a range search is reduced only to points "nearby"

cost / mapping function for determining center of object based on detected features

I wrote an object tracker that will try to detect and follow a moving object in a recorded video. In order to maximize the detection rate, my algorithm is using a bunch of detection & tracking algorithms (cascade, foreground & particle tracker). Each tracking algorithm will return me some point of interest that might be part of the object that I'm trying to track. Let's assume (for the simplicity of this example) that my object is a rectangle and that the three tracking algorithms returned the points 1, 2 and 3:
Based on the relation / distance of these three points it is possible to calculate the center of gravity (blue X in above image) of the tracked object. So for each frame I might be able to come up with some good estimate of the center of gravity. However, the object might move from one frame to the next:
In this example I merely rotated the original object. My algorithm will give me three new points of interest: 1',2' and 3'. I could again calculate the center of gravity based on these three new points, but I would throw away important information that I've acquired from the previous frame: based on points 1, 2 and 3 I already do know something about the relationship of these points and thus by combining the information from 1, 2 and 3 and 1',2' and 3' I should be able to come up with a better estimate of the center of gravity.
Furthermore, the next frame might yield a forth data point:
This is what I would like to do (but I don't know how):
based on the individual points (and their relationship to each other) that are returned from the different tracking algorithms, I want to build up a localization map of the tracked object. Intuitively I feel like I need to come up with A) an identification function that will identify individual points across frames and B) some cost function that will determine how similar tracked points (and the relationship / distance between them) are from frame to frame, but I can't get my head around on how to implement this. Alternatively, maybe some kind of map buildup based on the points will work. But again, I don't know how to approach this.
Any advice (and example code) is highly appreciated!
EDIT1
a simple particle filter might probably work too, but I again don't know how to define the cost function. A particle filter for tracking a certain color is easy to program: for each pixel you calculate the difference between target color and pixel color. But how would I do the same for estimating the relationship between tracked points?
EDIT2 intuitively I feel like Kalman filters could also help with the prediction step. See slides 24 - 32 of this pdf. Or am I misled?
What I think you're trying to do is essentially build up a state space of features, which can be applied to a filtering process, such as an Extended Kalman Filter. This is a useful framework when you have multiple observations in every frame, and you're trying to estimate or measure something indicated by these observations.
To determine the similarity of the tracked points, you can perform simple template matching from frame to frame for small regions around the points. One way of doing this is to extract an NxN (say, 7x7) region around point a in frame n and point a' in frame n+1, followed by normalised cross correlation between the extracted regions. This will give you a reasonable measure of how similar the patches are. If the patches are not similar, then you've probably lost track of that point.
There is an enormous literature on this and related problems starting in the 80's. Try searching for "optical flow" algorithms". The input for such algorithms is two successive frames of the same scene. The output is a vector field, one vector per pixel in the second image, which shows what the direction and speed of movement of the feature in that field. This presentation is a pretty nice summary.
A nice thing about optical flow is that many algorithms for it parallelize nicely and map onto your favorite video card GPU, so they can run in real time. Think ESPN overlays.
According to me, in order to identify who is who in each frame, you will have to use a greater dimension. For example if you want to know which point is where between two frame (considering your extracted point are same), you will have to build vectors or simplex and then deduce an organisation between your points (like angles values).
The main problem is that combinations increase with point number. If your camera is a fixed point then, you could use background as a reference in order to deduce object rotations and translations, i mean build vectors between background interest points and object points in order to clearly identify them.
hope that help go forward.
I would recommend looking in to the divided difference filter (DDF), which is similar to the extended Kalman filter (EKF), but does not require an approximate model of the dynamics of your system (which you may not have). Basically the DDF approximates the derivatives used in the EKF using a difference equation. There are plenty of papers online about this, but I do not know whether you have access to them so I have not linked them here. If you are working from a university or a company that has access to online journals (like IEEE Explore), then just Google "divided difference filter" and check out some of the papers.

Convert a list java.awt.geom.Point2D to a java.awt.geom.Area

I have a set of points that i want to turn into a closed polygon in Java. I'm currently trying to use java.awt.geom.Point2D and java.awt.geom.Area but can't figure out how to turn a group of the points into an Area.
I think I can define a set of Line2Ds based on the points and then add those to the Areas, but that's a lot of work and I'm lazy. So is there an easier way to go.
The problem is I have a list of lat/lon coordinates and want to build up an area that I can use for hit testing.
Non-core Java libraries are a possibility as well.
Update, I looked at using java.awt.Polygon but it only supports ints and I'm operating with doubles for the coordinates.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4077518
Hear that, "customer"? You should be using GeneralPath, even though the absence of Polygon2D since the late 1990s is an obvious monster-truck-sized hole in the API.
If you are actually working with Geodetic lat/lon values, you can actually use OpenMap to do some of this work. I just spent some time using the Geo class in that API to bounce an object around an area defined by a polygon of lat/lon points. There are intersection calls and everything and all of the math is done spherically so that the points are more correct as far as projections go.
The simplest (and laziest) thing to do is to create a bounding box for the points from the maximum and minimum of the X, Y ordinate values.
If you need a closer fit then rather than devise your own algorithm, this might be a good place to start:

Categories