So here's an odd question. I'm working on a kNN problem and need to find the nearest neighbor. I'm looking into distance, but once again, I don't care about the actual distance, just which one is closest. However, since distance can't be negative, I need to either square or take the absolute value of the distance.
So here are two options for how to accomplish this:
//note: it's been abstracted for multiple dimensions (not just x and y)
for(int i = 0; i < (numAttributes - 1); i++)
{
distance += Math.pow((a.value(i) - b.value(i)), 2);
}
and
//note: it's been abstracted for multiple dimensions (not just x and y)
for(int i = 0; i < (numAttributes - 1); i++)
{
distance += Math.abs(a.value(i) - b.value(i));
}
My question is which is faster. Since this is a data mining application, I want it to be able to process the information as quickly as possible. And while I understand, that in the guts, a power of two can be implemented with a shift, I'm not sure this is the case in such a high level language like Java where it gets translated for the JVM. Is there a reason why one is better than the other?
First, consider vectors A=[0,0,0], B=[1,1,1], C=[0,0,2]. Which one is closer to A? Is it B or C? Actually, caring about the distance measure is absolutely crucial in kNN. And we are only talking about Manhattan and euclidean distances. You could, by example, use the cosine similarity as well, and you should select the distance measure carefully, taking the into account the knowledge you have about your data.
Second, instead of such a low-level optimization, consider something smarter. Such as breaking your for(int i = 0; i < (numAttributes - 1); i++) loop as soon as too large distance is detected.
Third, using Math.pow(a,2) to compute a*a is definitely very inefficient.
Fourth, i < (numAttributes - 1)? Didn't you mean i < numAttributes??
Related
So I have a fairly large array that contains xyz coordinates, where array[0] = x0, array[1] = y0, array[2] = z0, array[3] = x1, array[4] = y1... and so on.
I'm running an algorithm on this array that is taking longer than I would like it to, and I want to split the work amongst threads. I have my threads set up, but I am not sure how to divide this array properly so I can distribute this work across 3 threads.
Even though I have an array length that is divisible by 3, this won't work, because splitting into 3 can split an xyz coordinate (for instance, if my array was size 15, dividing it by 3 will give me arrays of size 5, which means I'm splitting an XYZ coordinate.
How can I split this array (it doesn't have to necessarily be equal in size) so that I can distribute the work? (for instance, in the previous example, I would like to have two arrays of size 6 and one of size 3).
Note: The size of the array is variable, but is always divisible by 3.
EDIT: Sorry, should have mentioned that I'm working in Java. My algorithm iterates through a collection of coordinates and determines which coordinates lie inside of a particular 3d shape (such as an ellipsoid). It saves these coordinates and I perform other tasks with these coordinates (I'm working on a computer graphics app).
EDIT2: I'm going to elaborate on the algorithm a bit more.
Basically, I am working in Android OpenGL-ES-3.0. I have complex 3D-object with somewhere around 230000 vertices and close to a million triangles.
In the app, the user moves either a ellipsoid or box (they choose which one) to a location close to or on the object. After moving it, they click a button, which runs my algorithm.
The purpose of the algorithm is to determine which points from my object lie inside of the ellipsoid or box. These points are subsequently changed to a different color. To add to the complexity, however, is the fact that I have transformation matrices applied to both the points of the object and the points of the ellipsoid/box.
My current algorithm begins by iterating through all the points of the object. For those of you unclear on my iteration, this is my loop.
for(int i = 0; i < numberOfVertices*3;)
{
pointX = vertices[i];
i++;
pointY = vertices[i];
i++;
pointZ = vertices[i];
i++;
//consider transformations, then run algorithm
}
I perform the necessary steps to consider all my transformations, and after that is finished, I have a point from my object and the location of my ellipsoid/box centroid.
Then, depending on the shape, one of the following algorithms is used:
Ellipsoid: I use the centroid of the ellipse and apply the formula
(x−c)T RT A R(x−c) (sorry I don't know how to format that, I'll explain the formula). x is a column vector describing the xyz point from my object that I am on in my iteration. c is a column vector describing the xyz point of my centroid. T is supposed to mean transpose. R is my rotation matrix. A is a diagonal matrix with entries with entries (1/a^2, 1/b^2, 1/c^2), and I have values for a b and c. If this formula is > 1, then x lies outside of my ellipsoid and is not a valid point. If it is <=1, then I save x.
Box: I simply check if the point falls within a range. If the point of the object lies a certain distance in the X-direction, Y-direction, and Z-direction from the centroid, I save it.
These algorithms are accurate, and work as intended. The issue, is obviously efficiency. I don't seem to have a good understanding of what makes my app strain and what doesn't. I thought multi-threading would work, and I tried some of the techniques described but they didn't have a significant improvement on performance. If anyone has ideas on filtering out my search so I'm not iterating through all these points, it would help.
May I suggest a slightly different way to handle it. I know this isn't a direct answer to your question, but please consider it.
This could be easier to see if you implemented it as coordinate Objects, each with x, y and z values. Your "array" would now be 1/3 as long. You might think this would be less efficient--and you might be right--but you'd be surprised at how well java can optimize things. Often Java optimizes for the cases people use the most and your manually manipulating this array as you suggest is possibly even slower than using objects. Until you've proven the most readable design too slow you shouldn't optimize it.
Now you have a collection of coordinate objects. Java has queues that multiple threads can pull from efficiently. Dump all your objects into a queue and have each of your threads pull one and work on it by processing it and putting it in a "Completed" queue. Note that this gives you the ability to add or remove threads easily, without effecting your code except for 1 number. How would you take the array based solution to 4 or 6 threads?
Good luck
Here is a demo of the work explained below.
Observations
Each coordinate is 3 indexes.
You have 3 threads.
Let's say you have 17 coordinates, that's 51 indexes. You want to split the 17 coordinates among your 3 threads.
var arraySize = 51;
var numberOfThreads = 3;
var numberOfIndexesPerCoordinate = 3;
var numberOfCoordinates = arraySize / numberOfIndexesPerCoordinate; //17 coordinates
Now split that 17 coordinates among your threads.
var coordinatesPerThread = numberOfCoordinates / numberOfThreads; //5.6667
This isn't an even number, so you need to distribute unevenly. We can use Math.floor and modulo to distribute.
var floored = Math.floor(coordinatesPerThread); //5 - every thread gets at least 5.
var modulod = numberOfCoordinates % floored; // 2 - there will be 2 left that need to be placed sequentially into your thread pool
This should give you all the information you need. Without knowing what language you are using, I don't want to give any real code samples.
I see you edited your question to specify Java as your language. I'm not going to do the threading work for you, but I'll give a rough idea.
float[] coordinates = new float[17 * 3]; //17 coordinates with 3 indexes each.
int numberOfThreads = 3;
int numberOfIndexesPerCoordinate = 3;
int numberOfCoordinates = coordinates.length / numberOfIndexesPerCoordinate ; //coordinates * 3 indexes each = 17
//Every thread has this many coordinates
int coordinatesPerThread = Math.floor(numberOfCoordinates / numberOfThreads);
//This is the number of coordinates remaining that couldn't evenly be split.
int remainingCoordinates = numberOfCoordinates % coordinatesPerThread
//To make things easier, I'm just going to track the offset in the original array. It could probably be computed instead, but its just an int.
int offset = 0;
for (int i = 0; i < numberOfThreads; i++) {
int numberOfIndexes = coordinatesPerThread * numberOfIndexesPerCoordinate;
//If this index is one of the remainders, then increase by 1 coordinate (3 indexes).
if (i < remainingCoordinates)
numberOfIndexes += numberOfIndexesPerCoordinate ;
float[] dest = new float[numberOfIndexes];
System.arraycopy(coordinates, offset, dest, 0, numberOfIndexes);
offset += numberOfIndexes;
//Put the dest array of indexes into your threads.
}
Another, potentially better option would be to use a Concurrent Deque that has all of your coordinates, and have each thread pull from it as they need a new coordinate to work with. For this solution, you'd need to create Coordinate objects.
Declare a Coordinate object
public static class Coordinate {
protected float x;
protected float y;
protected float z;
public Coordinate(float x, float y, float z) {
this.x = x;
this.y = y;
this.z = z;
}
}
Declare a task to do your work, and pass it your concurrent deque.
public static class CoordinateTask implements Runnable {
private final Deque<Coordinate> deque;
public CoordinateTask(Deque<Coordinate> deque) {
this.deque = deque;
}
public void run() {
Coordinate coordinate;
while ((coordinate = this.deque.poll()) != null) {
//Do your processing here.
System.out.println(String.format("Proccessing coordinate <%f, %f, %f>.",
coordinate.x,
coordinate.y,
coordinate.z));
}
}
}
Here's the main method showing the example in action
public static void main(String []args){
Coordinate[] coordinates = new Coordinate[17];
for (int i = 0; i < coordinates.length; i++)
coordinates[i] = new Coordinate(i, i + 1, i + 2);
final Deque<Coordinate> deque = new ConcurrentLinkedDeque<Coordinate>(Arrays.asList(coordinates));
Thread t1 = new Thread(new CoordinateTask(deque));
Thread t2 = new Thread(new CoordinateTask(deque));
Thread t3 = new Thread(new CoordinateTask(deque));
t1.start();
t2.start();
t3.start();
}
See this demo.
Before trying to optimize with concurrency, try to minimize the amount of points you need to test, and minimize the cost of those tests, by using the most efficient collision detection methods at your disposal.
Some general suggestions:
Consider normalizing everything to a common frame of reference before running through your calculations. For example, instead of applying transformations to each point, transform the selection box/ellipsoid into the shape's coordinate system so you can perform your collision detection without the transformations within each iteration.
You may also be able to combine some or all of your transformations (rotation, translation, etc.) into a single matrix calculation, but that won't gain you much unless you're performing a lot of transformations, which you should try to avoid.
Generally speaking it's beneficial to keep the transformation pipeline as streamlined as possible, and keep all coordinate calculations in the same space to avoid transformations as much as possible.
Try to minimize the number of points you need to perform your slowest calculations on. The most accurate collision test should only be necessary for points that you can't rule out as being inside the shape by faster means, using an approximation of the shape, such as a collection of spheres, or the shape's convex hull. Simplifying the shape allows you to limit the slowest calculations to only those points that lie very close to your shape's actual bounds.
In my own 2D work in the past I found that even calculating the convex hulls for hundreds of complex animated shapes in real time was faster than doing collision detection directly without using their convex hulls, because they enable much faster collision calculations.
Consider calculating/storing additional information about the shape, such as an inner and outer collision sphere (one sphere inside all points, and one outside all points) which you can use as a fast initial filter. Anything inside the smaller sphere is guaranteed to be inside your shape, anything outside the outer sphere is known to be outside your shape. You might even want to store a simplified version of your shape, (or its convex hull), which you could calculate in advance and use to aid collision detection.
Similarly, consider using one or more spheres to approximate your ellipsoid in initial calculations, to minimize which points you need to test for collision.
Instead of calculating actual distances, calculate the squared distances and use those for comparison. However, prefer using faster tests for collision if possible. For example, for convex polygons you can use the Separating Axis Theorem, which projects vertices onto a common axis/plane to permit very quick overlap calculations.
I would like to use Simulated Annealing to find local minimum of single variable Polynomial function, within some predefined interval. I would also like to try and find Global minimum of Quadratic function.
Derivative-free algorithm such as this is not the best way to tackle the problem, so this is only for study purposes.
While the algorithm itself is pretty straight-forward, i am not sure how to efficiently select neighbor in single or n-dimensional space.
Lets say that i am looking for local minimum of function: 2*x^3+x+1 over interval [-0.5, 30], and assume that interval is reduced to tenths of each number, e.g {1.1, 1.2 ,1.3 , ..., 29.9, 30}.
What i would like to achieve is balance between random walk and speed of convergence from starting point to points with lower energy.
If i simply select random number form the given interval every time, then there is no random walk and the algorithm might circle around. If, on the contrary, next point is selected by simply adding or subtracting 0.1 with the equal probability, then the algorithm might turn into exhaustive search - based on the starting point.
How should i efficiently balance Simulated Annealing neighbor search in single dimensional and n-dimensional space ?
So you are trying to find an n-dimensional point P' that is "randomly" near another n-dimensional point P; for example, at distance T. (Since this is simulated annealing, I assume that you will be decrementing T once in a while).
This could work:
double[] displacement(double t, int dimension, Random r) {
double[] d = new double[dimension];
for (int i=0; i<dimension; i++) d[i] = r.nextGaussian()*t;
return d;
}
The output is randomly distributed in all directions and centred on the origin (notice that r.nextDouble() would favour 45º angles and be centred at 0.5). You can vary the displacement by increasing t as needed; 95% of results will be within 2*t of the origin.
EDIT:
To generate a displaced point near a given one, you could modify it as
double[] displaced(double t, double[] p, Random r) {
double[] d = new double[p.length];
for (int i=0; i<p.length; i++) d[i] = p[i] + r.nextGaussian()*t;
return d;
}
You should use the same r for all calls (because if you create a new Random() for each you will keep getting the same displacements over and over).
In "Numerical Recepies in C++" there is a chapter titled "Continuous Minimization by Simulated Annealing". In it we have
A generator of random changes is inefficient if, when local downhill moves exist, it nevertheless almost always proposes an uphill move. A good generator, we think, should not become inefficient in narrow valleys; nor should it become more and more inefficient as convergence to a minimum is approached.
They then proceed to discuss a "downhill simplex method".
What I'm asking about is not a duplicate of this very popular question. For a randomly chosen input, some quick tests can be done and if they fail to say "not a square", some computation of the square root has to be done (I myself tried a solution, too).
When the numbers to be tested come from a simple sequence, the situation differs, as it's possible to use the previous (approximate) square root. For a trivial sequence it's trivial as well, e.g.,
long sqrt = 1;
for (long i=1; i<limit; ++i) {
if (sqrt*sqrt == i) {
handleSquare(i);
++sqrt;
}
}
My question is what can be done for more complicated sequences like
x[i] = start + i*i;
or
x[i] = start - i*i*i;
I'm thinking about Newton's method, but I can't see how to make it fast (as division is a pretty expensive operation).
To what kind of sequence would you like to apply your algorithm ? Below is a solution that should work well when x[i] diverges but not too fast.
For instance if
x[i] = a*i^p + o(i^p)
and that i is large enough then you will have
x[i+1]-x[i] ~ p * a * i^{p-1}.
If y[i] denotes the largest integer such that
y[i]^2 <= x[i]
then you have
y[i] ~ sqrt(a) i^{p/2}
and
y[i+1]-y[i] ~ 1/(2 y[i]) * (x[i+1]-x[i]) ~ p/2 * sqrt(a) i^{p/2-1}
So you can take this as a guess for y[i+1] and then update to the correct value, which should save you some iterations.
In general you can always use the formula
y[i+1]-y[i] ~ 1/(2 y[i]) * (x[i+1]-x[i])
as a guess but this will be useful only if x[i+1]-x[i] is small with respect to y[i]^2 -- ie, with respect to x[i]. It may also be worth improving a bit on the formula using the (exact) second order expansion of
y[i+1]^2 = y[i]^2 + 2y[i](y[i+1]-y[i]) + (y[i+1]-y[i])^2
in order to improve the guess of y[i+1].
Note that this won't work well if x[i] remains bounded when i is large or if x diverges exponentially fast.
I want to solve a 3-dimesional knapsack problem.
I got a number of boxes with a different width, height, length and value. I got a specified space and I want to put the boxes into that space, such that I will get the optimal profit. I would like to do it with using bruteforce.
I'm programming in Java.
I tried to do it with recursion, so:
public void solveBruteforce(double freeX, double freeY, double freeZ) {
for(int i = 0; i < numOfBoxes; i++) {
for(int j = 0; j < BoxObject.numOfVariations; j++) {
if(possible to place box) {
place(box);
add(value);
solveBruteforce(newX, newY, newZ);
}
}
}
remove(box);
remove(value);
}
But I will get the problem that each line has a different free x, y and z.
Could someone help me to find another way to do it?
First thing is, use an octree to keep track of where things are in the space. Occupancy tree is a 3D 4-out-degree tree, with occupancy flags at every node, dividing your space into a place that is efficient to search over. This would be useful if you want to use some kind of heuristic search to place the boxes, and even if you are trying all possibilities. It can shortcut the forbidden (crowded) placements.
Brute force will take a long time. But if that's what you want you need to define an ordering for trying out permutations of placements.
Since you will need many iterations, recursion is not so great since you will get a stack overflow.
A first-draft alternative would involve a greedy algorithm. Take the box that maximizes your profit (say, the largest), place that, then take the next largest box, and find the best fit for that, and so on.
But, say you wanted to try all possible combinations:
def maximize_profit(boxes,space):
max_profit = 0
best_fits = list()
while(Arranger.hasNext()):
a_fit,a_profit = Arranger.next(boxes,space)
if (a_profit == max_profit):
best_fits.append(a_fit)
elif (a_profit > max_profit):
max_profit = a_profit
best_fits = [ a_profit ]
return best_fits, max_profit
For ideas on how to define the Arranger, think about choosing #{box} slots from #{space} possibilities, respecting arrangements that are identical w.r.t. symmetry. Alternately maybe a "flood fill" method will give you ideas.
Is there a distance calculation implementation using hadoop map/reduce. I am trying to calculate a distance between a given set of points.
Looking for any resources.
Edit
This is a very intelligent solution. I have tried some how like the first algorithm, and I get almost what I was looking for. I am not concerned about optimizing the program at the moment, but my problem was the dist(X,Y) function was not working. When I got all the points on the reducer, I was unable to go through all the points on an Iterator and calculate the distance. Someone on stackoverflow.com told me that the Iterator on hadoop is different than the normal JAVA Iterator, i am not sure about that. But if i can find a simple way to go through the Iterator on my dist() function, i can use your second algorithm to optimize.
//This is your code and I am refering to that code too, just to make my point clear.
map(x,y) {
for i in 1:N #number of points
emit(i, (x,y)) //i did exactly like this
reduce (i, X)
p1 = X[i]
for j in i:N
// here is my problem, I can't get the values from the Iterator.
emit(dist(X[i], X[j]))
you need to do a self join on that data set. In hive that would look like, more or less
select dist(P1.x,P1.y,P2.x, P2.y) from points P1 join points P2 on (True) where P1.x < P2.x or (P1.x = P2.x and P1.y < P2.y)
The function dist would need to be implemented using other hive functions or written in Java and added as a UDF. Also I am not sure about the True constant but you can write 0=0 to the same effect. The where clause is to avoid computing the same distances twice or 0 distances. The question is: would hive optimize this the way you can do programming carefully in hadoop? I am not sure. This is a sketch in hadoop
map(x,y) {
for i in 1:N #number of points
emit(i, (x,y))
reduce (i, X)
p1 = X[i]
for j in i:N
emit(dist(X[i], X[j]))
For this to work you need X to get to the reducer sorted in some order, for instance by x and then by y using secondary sort keys (that do not affect the grouping). This way every reducer gets a copy of all the points and works on a column of the distance matrix you are trying to generate. The memory requirements are minimal. You could trade some communication for memory by re-organizing the computation so that every reducer computes a square submatrix of the final matrix, knowing only two subsets of the points and calculating the distances among all of them. To achieve this, you need to make explicit the order of your points, say you are storing i, x, y
map(i,x,y) {
for j in 1:N/k #k is size of submatrix
emit((i/k, j), ("row", (x,y)))
emit((j, i/k), ("col", (x,y)))
reduce ((a,b), Z)
split Z in rows X and cols Y
for x in X
for y in Y
emit(dist(x,y))
In this case you can see that the map phase emits only 2*N*N/k points, whereas the previous algorithm emitted N^2. Here we have (N/k)^2 reducers vs N for the other one. Each reducer has to hold k values in memory (using the secondary key technique to have all the rows get to the reducer before all the columns), vs only 2 before. So you see there are tradeoffs and for the second algorithm you can use the parameter k for perf tuning.
This problem does not sound like a good fit for map-reduce since you're not really able to break it into pieces and calculate each piece independently. If you could have a separate program that generates the complete graph of your points as a list (x1,y1,x2,y2) then you could do a straightforward map to get the distance.