I am stuck on this problem and was wondering if anyone could help me out:
There are n houses on the x-axis {x_1, x_2,...x_n}, I need to find the location on the x-axis that gives me the smallest sum of distances between the houses and the location.
This is trivial of course, but I also need to be able to do it in O(n) time, and I am stuck on the dynamic algorithm.
Edit: Apparently it did not need to be a DP algorithm, which as I said makes it trivial, sorry for the confusion, and thanks for the responses.
Solving the problem amounts to finding the median of {xi}.
There are well-known linear-time algorithms for finding the median. See, for example, Wikipedia.
I know median finding reasonably well, and I know dynamic programming reasonably well, but I don't know of any median finding algorithms that I could reasonably construe as DP.
If your x's were sorted and you didn't know the median was the answer, I could see computing partial sums from the right and left of a given index as DP-ish sub problems. The optimal solution then minimizes the sum of the right and left partial sums.
But of course, I strongly dislike problems that say, "Solve X with Y", especially when Y doesn't really fit. "Solve X, you might want to consider using Y", is a much better form of problem.
Related
I'm trying to solve a problem about AI. I have a "robot" that should go from a point A to B much quickly and cheap as possible. This Rover can't climb heights higher than 10 units and the cost of his route is influenced by the kind of terrain. I need your help becouse I need to find a admissible heuristic for solving my problem. I already try the Euclidian distance but it's not enough. Can you help me?
I would recommend taking a look at this page for some ideas of heuristics. They will not all be applicable in your situation, since I don't believe you have a grid map? But you can at least have a look.
Furthermore, I'd recommend trying to take into account the various costs that are possible in different kinds of terrain. If, for example, you know that every single possible type of terrain has a minimum cost of 2, you can safely multiply your Euclidean distance by 2. If the most common type of terrain has a cost of 2, but there also are some types of terrain with a lower cost, you lose the guarantee of finding an optimal solution if you start multiplying by 2, but you may still find solutions more quickly in practice.
To me, the question sounds like a homework assignment (correct me if I'm wrong), which makes it difficult to give a single answer. I guess that the point of the homework assignment even is to research a bit and try some different things and see how they work (or don't work), which can improve your understanding of how the algorithm works. So, really, just try some things.
I am working with BOBYQA Algorithm for optimisation issues. I have a question about the efficiency of this algorithm and how to set the right parameters.
Bobyqa is based on trust region and for this purpose one needs to set a number of interpolated value to approximate the function with quadratic formula. It is strongly recommended to choose this number between n+2 and 2n+1. I have some difficulties to see how to choose the right one. I have a problem with 9 unknown and I select empirically the best number. I have also noticed that this parameter could dramatically increase calculation time.
If someone can share his own experience with this algorithm, it would be helpful.
Thanks
So I've been looking for an explanation of a 2-opt improvement for the traveling salesman problem, and I get the jist of it, but I don't understand one thing.
I understand that IF two edges of a generated path cross each other, I can just switch two points and they will no longer cross. HOWEVER - I do not understand how I can determine whether or not two edges cross.
To make my question clear, this is what I've done so far: (I did it in java)
I have an object called Point that represents a city, with an x and y coordinate.
I have a PointSet which has a set of Points contained in a List.
I have a method for PointSet called computeByNN() which arranges the PointSet in a fairly short manner through a Nearest Neighbor algorithm.
So now I have a sorted PointSet (not optimal, but still short) and I want to do 2-opt on it. However, I don't know where to start. Should I check each and every line segment to see if they cross, and if they do, switch two points? I feel like that defeats the purpose of heuristics and it becomes a sort of brute force solution. Is there an efficient way to find if two segments of the tour cross?
I applogize if my question is not clear. I'll try to edit it to make it clearer if anyone needs me to.
If you like you can create a look-up table to detect crossed edges. For n = 1000, order 10^12 entries is obviously too extravagant. However you probably are most worried about the shorter edges? Suppose you aimed to include edges to about √n of the nearest neighbors for each node. Then you are only in the realm of megabytes of space and in any case O(n^2) preprocessing. From there it's a heuristic, so good luck!
Also will mention this can be done on the fly.
I am implementing a project which needs to cluster geographical points. OPTICS algorithm seems to be a very nice solution. It needs just 2 parameters as input(MinPts and Epsilon), which are, respectively, the minimum number of points needed to consider them as a cluster, and the distance value used to compare if two points are in can be placed in same cluster.
My problem is that, due to the extreme variety of the points, I can't set a fixed epsilon.
Just look at the image below.
The same points structure but in a different scale would result very different. Suppose to set MinPts=2 and epsilon = 1Km.
On the left, the algorithm would create 2 clusters(red and blue), but on the right it would create one single cluster containing all of the points(red), but I would like to obtain 2 clusters even on the right.
So my question is: is there any kind of way to calculate dynamically the epsilon value to get this result?
EDIT 05 June 2012 3.15pm:
I thought I was using the OPTICS algorithm implementation from the javaml library, but it seems it is actually a DBSCAN algorithm implementation.
So the question now is: does anybody know a java based implementation of OPTICS algorithm?
Thank you very much and excuse my for my poor english.
Marco
The epsilon value in OPTICS is solely to limit the runtime complexity when using index structures. If you do not have an index for acceleration, you can set it to infinity.
To quote Wikipedia on OPTICS
The parameter \varepsilon is strictly speaking not necessary. It can be set to a maximum value. When a spatial index is available, it does however play a practical role when it comes to complexity.
What you seem to have looks much more like DBSCAN than OPTICS. In OPTICS, you should not need to choose epsilon (it should have been called max-epsilon by the authors!), but your cluster extraction method will take care of that. Are you using the Xi extraction proposed in the OPTICS paper?
minPts is much more important. You should try a value of at least 5 or 10, not 2. With 2, you are essentially performing single-linkage clustering!
The example you gave above should work fine once you increase minPts!
Re: edit: As you can even see in the Wikipedia article, ELKI has a proper OPTICS implementation and it's in Java.
You'd can try to scale epsilon by the total size of the enclosing rectangle. For example, your left data is about 4km x 6km (using my Mark I eyeball to measure) and the right is about 2km x 2km. So, epsilon on the right should be about 2.5 times smaller.
Of course, this doesn't work reliably. If, on your right hand data, there were an additional single point 4km to the right and 2km down, that would make the enclosing rectangle for the right the same as on the left, and you'd get similar (wrong) results.
You can try a minimum spanning tree and then remove the longest edge. The remaining spanning tree and the center of them is the best center for OPTICS and you can count the numbers of points around it.
In your explanation above, it is the change in scale which creates the uncertainty. When your scale gets bigger, your epsilon should change accordingly. Because they are at two very different scales, the two images you've presented are NOT the same set of points. They will not respond identically to your OPTICS algorithm without changing the parameters.
In short, no. there is no way to dynamically calculate epsilon to get this result. Clustering like this is already NP-Hard, and these clustering algorithims (optics, k-means, veroni) can only approximate the optimal solution.
I have a set of subgraphs and I need to match them on the graph they were extracted from. I also need to count how many times each subgraph shows up in such graph (I need to store all possible matches). There must be a perfect match considering the edges' labels of both subgraph and graph, the vertices' labels, however, don´t need to match each other. I built my system using JUNG API, so I would like a solution (api, algorithm etc) that could deal with the Graph structure provided by JUNG. Any thoughts?
JUNG is very full-featured, so if there isn't a graph analysis algorithm in JUNG for what you need, there's usually a strong, theoretical reason for it. To me, your problem sounds like an instance of the Subgraph Isomorphism problem, which is NP-Complete:
http://en.wikipedia.org/wiki/Subgraph_isomorphism_problem
NP-Completeness may or may not be familiar to you (it took me 7 years of college and Master's Degree in Computer Science to understand it!), so I'll give a high-level description here. Certain problems, like sorting, can be solved in Polynomial time with respect to their input size. For example, if I have a list of N elements, I can sort it in O(N log(N)) time. More specifically, if I can solve a problem in Polynomial time, this means I can solve the problem without exhausting every possible solution. In the sorting case, I could traverse every possible permutation of the list and, if I found a permutation of the list that was sorted, return it. This is obviously not the fastest way to solve the problem though! Some very clever mathematicians were able to get it down to its theoretical minimum of O(N log(N)), thus we can sort really big lists of things quite quickly using computers today.
On the flip-side, NP-Complete problems are thought to have no Polynomial time solution (I say thought because no one has ever proven it, although evidence strongly suggests this is the case). Anyway, what this means is that you cannot definitively solve an NP-Complete problem without first exhausting every possible solution. The time complexity of NP-Complete problems are always O(c ^ N) or worse, where c is some constant greater than 1. This means that the time required to solve the problem grows exponentially with every incremental increase in problem size.
So what does this have to do with my problem???
What I'm getting at here is that, if the Subgraph Isomorphism problem is NP-Complete, then the only way you can determine if one graph is a subgraph of another graph is by exhausting every possible solution. So you can solve this, but probably only up to graphs of a few nodes or so (since the problem's time complexity grows exponentially with every incremental increase in graph size). This means that it is computationally infeasible to compute a solution for your problem because as soon as you reach a certain graph size, it will quite literally take forever to find a solution.
More practically, if your boss asks you to do something that is provably NP-Complete, you can simply say it's impossible and he will have to listen to you. If your professor asks you to do something that is provably NP-Complete, show him that it's NP-Complete and you'll probably get an A for the course. If YOU are trying to do something NP-Complete of your own accord, it's better to just move on to the next project... ;)
Well, I had to solve the problem by implementing it from scratch. I followed the strategy suggested in the topic Any working example of VF2 algorithm?. So, if someone is in doubt about this problem too, I suggest to take a look at Rich Apodaca's answer in the aforementioned topic.