Realworld parameter optimization - java

I'm in the need to do parameter optimization for my latest research project. I have an algorithm which has currently 5 parameters (four double [0,1] and one nominal with 3 values). The algorithm uses those parameters to calculate some stuff and afterwards I calculate the Precision, Recall & FMeasure. A single run takes about 1,8s. Currently I'm going through each parameter with a 0.1 step size which shows me approximately where the global maxima is. But I want to find the precise global maximum. I've looked into Gradient Descent but I don't really know how to apply this to my algorithm (if it's even possible). Could anybody please guide me a little how I would implement such an algorithm since I'm very new to this kind of work.
Cheers,
Daniel

You can certainly do better than a grid search.
Before applying an algorithm like gradient descent, you have to be sure that your parameter space does not contain local maxima or that at least your starting point is close to the global maximum and your step size is appropriate enough to bring you to it.
In your case, I would recommend starting by drawing as many random samples as you can. This is a much better way of exploring the parameter space than a grid search. Once you collect enough data this way, you can use a mode-finding algorithm, such as mean shift or one of its faster derivatives, or go straight to optimization. Since you don't have the Jacobian of your parameter space, you could use the Broyden's method, which iteratively approximates it or a secant method, such as BFGS.
Also, see this related question: How can I adjust parameters for image processing algorithm in an efficient way?

Related

Sampling a smaller set of line graph points without losing trends

Given a set of X/Y co-ordinates ([(x,y)] with increasing X(representing a timestamp) and Y representing a value/measurement at that timestamp.
This set can possibly be huge and i would like to avoid returning every single point in the set for display but rather find a smaller subset that would represent the overall trend of the measurement(some level of accuracy loss in the line graph will be acceptable).
So far, i tried the simple uniform sampling of measurement skipping points at uniform interval, then adding the max/min measurement value to the subset. While this is simple, It doesn't really account well for local peaks or valleys if the measurement fluctuates often.
I'm wondering if there are any standard algorithms that deal with solving this type of problems on server side?
Appreciate if anyone has solved it or know of any util/common libraries solving such problems. I'm on Java, but if there is any reference to standard algorithms i might try to implement one in Java.
It's hard to give a general answer to this question. It all depends on how your datapoints are stored, what properties your chart has, how it is rendered etc.
But as #dmuir suggested, you should check out the Douglas-Peucker algorithm. Another approach I just thought up could be to split the input data into chunks of some size (maybe corresponding to a single horizontal pixel) and then using some statistic (min, max, or average) for rendering chunk. If you use running statistics when adding data points to a chunk, this should be O(n), so it's not more expensive than the reading on of your data points.

How can I correctly convert geographical coordinates to pixels on screen?

I'm trying to make a Java project that pinpoints the place on a image of a map, when given coordinates (taken from Google Maps).
I've tried using the top-left corner of the image (place that has highest latitude, and the lowest longitude), as an some kind of an reference point, which would be (0,0) point on the map image, and than I've tried to calculate every place on the map based on that reference point. However, this method proved inaccurate, probably because of the curvature of the Earth (mind that the map I'm working with (Serbia) covers area of 4° latitude, and 4° longitude).
I've seen couple of answers talking about converting into Mercator projection, but they are not very clear to me, because they are not covering a case similar to mine, and are not written in Java.
What can I do to pinpoint those points more accurately (±3km would be accurate enough)?
As comments have pointed oit correctly, in order to precisely convert between geographic coordinates and map position, you have to know the method of projection used for the map, and a sufficient number of parameters so that tuning the remaining parameters using a suitable set of reference points becomes feasible.
So if you assume a conic projection, then read the document David pointed out, and this referenced follow-up as well. As you can see, within the family of conic projections, there are a few alternatives to choose from. Each of them is described by a few parameters (i.e. standard parallels, cone constant, aspect ratio, …). You'd make guesses for these and then use some numerical optimization to obtain a best fit. Then you take the best parameter fit for each kind of projection and see which of them has the best overall fit. Quite a bit of work. If you don't want to implement the code for all these projections you can use proj.4 either on the command line or as a native library. To do the numeric optimization, you could possibly try to adapt one of the COIN-OR projects to your application.
In any case, the first step would be creating a suitable set of reference points which you can use to evaluate the fit. So pick a few prominent points on your map and find Google Earth coordinates for these. I'd say you should have at least a dozen points, to account for the fact that you know so little about your map. Otherwise there is a great risk that you will tune the large number of parameters to exactly fit your points while the rest of the map is still completely off. Even with this number of reference points, since the area of Serbia is not that big (compared to maps spanning whole continents), the errors of a wrong guess or a bad fit might be very small. So it might be hard to actually decide which projection has been used.
With all that I said above, and even with external libraries taking care of the projection and the numerical optimization, it might easily take you half a year just to set up the tools to work out the projection. So decide whether that's worth the effort. If not, there are several alternatives. One would be to take a different map, one where you know the projection. Or contact the author of your map and obtain the projection. Or ask someone working in geodesics in Serbia, because they might have enough experience to recognize the projection at a glance, I don't know.
One other option is by combining the fact that you need reference points with the fact that you might not be able to work out the exact projection in any case. Simply combine these in the following way: choose a suitably dense set of reference points, evenly distributed over the map. Then interpolate between them, picewise linearily or with higher degree or using some weighted interpolation scheme or whatever. You know there is a projection behind all this, but you give up on working out the projection, and simply mitigate the symptom: by having enough reference points, each data item is close enough to a reference point to keep the error smaller than your threshold.
I found an answer I was looking for in this thread: Java, convert lat/lon to UTM
I find out that the actual projection of my map was UTM. From there, it was simply finding a class that would convert my lat/lon coordinates into UTM eastings and northings (very useful code in this answer), and then I would do simple math to find out where the point is compared to the boundaries of the map, and it's actually working.

BOBYQA Algorithm

I am working with BOBYQA Algorithm for optimisation issues. I have a question about the efficiency of this algorithm and how to set the right parameters.
Bobyqa is based on trust region and for this purpose one needs to set a number of interpolated value to approximate the function with quadratic formula. It is strongly recommended to choose this number between n+2 and 2n+1. I have some difficulties to see how to choose the right one. I have a problem with 9 unknown and I select empirically the best number. I have also noticed that this parameter could dramatically increase calculation time.
If someone can share his own experience with this algorithm, it would be helpful.
Thanks

Programmatically finding periodicity of a given function

I am working on a project in Android for my Signal Processing course. My aim is to find signal properties, such as periodicity, even/odd, causality etc, given a user-input function. Right now, I am stuck at trying to figure out how to programmatically calculate the periodicity of a given function. I know the basics behind periodicity: f(t+T) = f(t)
The only idea I have right now is to extensively calculate values for the function, and then check for repetition of values. I know the method is quite stupid, given the fact I don't know how many such values I need to calculate to determine if it is periodic or not.
I know this can be done easily in Matlab, but again very difficult to port Matlab to Java. Is there something I am missing? I have been searching a lot, but haven't found anything useful.
Thanks for any help, in advance!
If the function f is given as a symbolic expression, then the equation you stated can in some cases be solved symbolically. The amount of work required for this will depend on how your functions are described, what kinds of functions you allow, what libraries you use and so on.
If your only interaction with the function is evaluating it, i.e. if you treat the function description as a black box or obtain its values from some sensor, then your best bet would be a Fourier transformation of the data, to convert it from the time domain into frequency domain. In particularly, you probably want to choose your number of of samples to analyze as a power of two, and then use FFT to quickly obtain intensities for various frequencies.

OPTICS Clustering algorithm. How to get the best epsilon

I am implementing a project which needs to cluster geographical points. OPTICS algorithm seems to be a very nice solution. It needs just 2 parameters as input(MinPts and Epsilon), which are, respectively, the minimum number of points needed to consider them as a cluster, and the distance value used to compare if two points are in can be placed in same cluster.
My problem is that, due to the extreme variety of the points, I can't set a fixed epsilon.
Just look at the image below.
The same points structure but in a different scale would result very different. Suppose to set MinPts=2 and epsilon = 1Km.
On the left, the algorithm would create 2 clusters(red and blue), but on the right it would create one single cluster containing all of the points(red), but I would like to obtain 2 clusters even on the right.
So my question is: is there any kind of way to calculate dynamically the epsilon value to get this result?
EDIT 05 June 2012 3.15pm:
I thought I was using the OPTICS algorithm implementation from the javaml library, but it seems it is actually a DBSCAN algorithm implementation.
So the question now is: does anybody know a java based implementation of OPTICS algorithm?
Thank you very much and excuse my for my poor english.
Marco
The epsilon value in OPTICS is solely to limit the runtime complexity when using index structures. If you do not have an index for acceleration, you can set it to infinity.
To quote Wikipedia on OPTICS
The parameter \varepsilon is strictly speaking not necessary. It can be set to a maximum value. When a spatial index is available, it does however play a practical role when it comes to complexity.
What you seem to have looks much more like DBSCAN than OPTICS. In OPTICS, you should not need to choose epsilon (it should have been called max-epsilon by the authors!), but your cluster extraction method will take care of that. Are you using the Xi extraction proposed in the OPTICS paper?
minPts is much more important. You should try a value of at least 5 or 10, not 2. With 2, you are essentially performing single-linkage clustering!
The example you gave above should work fine once you increase minPts!
Re: edit: As you can even see in the Wikipedia article, ELKI has a proper OPTICS implementation and it's in Java.
You'd can try to scale epsilon by the total size of the enclosing rectangle. For example, your left data is about 4km x 6km (using my Mark I eyeball to measure) and the right is about 2km x 2km. So, epsilon on the right should be about 2.5 times smaller.
Of course, this doesn't work reliably. If, on your right hand data, there were an additional single point 4km to the right and 2km down, that would make the enclosing rectangle for the right the same as on the left, and you'd get similar (wrong) results.
You can try a minimum spanning tree and then remove the longest edge. The remaining spanning tree and the center of them is the best center for OPTICS and you can count the numbers of points around it.
In your explanation above, it is the change in scale which creates the uncertainty. When your scale gets bigger, your epsilon should change accordingly. Because they are at two very different scales, the two images you've presented are NOT the same set of points. They will not respond identically to your OPTICS algorithm without changing the parameters.
In short, no. there is no way to dynamically calculate epsilon to get this result. Clustering like this is already NP-Hard, and these clustering algorithims (optics, k-means, veroni) can only approximate the optimal solution.

Categories