I have two Weka instances which, when printed, look as follows:
0.44,0.34,0.48,0.5,0.3,0.33,0.43,cp
0.51,0.37,0.48,0.5,0.35,0.36,0.45,cp
I am trying to obtain their distance using the in-built Euclidean Distance function. My code:
EuclideanDistance e = new EuclideanDistance(neighbours);
double x = e.distance(neighbours.instance(0), neighbours.instance(1));
Where neighbours is an object of type Instances and the objects at indexes 0 and 1 are the two instances I referred to.
I am slightly confused because x is returned with value 1.5760032627255223 although, by doing the calculation separately, I was expecting 0.09798. cp is the class label, but earlier in my code I did specify data.setClassIndex(data.numAttributes() - 1);
Any advice?
By default, Weka's EuclideanDistance metric normalizes the ranges to compute the distance. If you don't want that, call e.setDontNormalize(true).
So I have a fairly large array that contains xyz coordinates, where array[0] = x0, array[1] = y0, array[2] = z0, array[3] = x1, array[4] = y1... and so on.
I'm running an algorithm on this array that is taking longer than I would like it to, and I want to split the work amongst threads. I have my threads set up, but I am not sure how to divide this array properly so I can distribute this work across 3 threads.
Even though I have an array length that is divisible by 3, this won't work, because splitting into 3 can split an xyz coordinate (for instance, if my array was size 15, dividing it by 3 will give me arrays of size 5, which means I'm splitting an XYZ coordinate.
How can I split this array (it doesn't have to necessarily be equal in size) so that I can distribute the work? (for instance, in the previous example, I would like to have two arrays of size 6 and one of size 3).
Note: The size of the array is variable, but is always divisible by 3.
EDIT: Sorry, should have mentioned that I'm working in Java. My algorithm iterates through a collection of coordinates and determines which coordinates lie inside of a particular 3d shape (such as an ellipsoid). It saves these coordinates and I perform other tasks with these coordinates (I'm working on a computer graphics app).
EDIT2: I'm going to elaborate on the algorithm a bit more.
Basically, I am working in Android OpenGL-ES-3.0. I have complex 3D-object with somewhere around 230000 vertices and close to a million triangles.
In the app, the user moves either a ellipsoid or box (they choose which one) to a location close to or on the object. After moving it, they click a button, which runs my algorithm.
The purpose of the algorithm is to determine which points from my object lie inside of the ellipsoid or box. These points are subsequently changed to a different color. To add to the complexity, however, is the fact that I have transformation matrices applied to both the points of the object and the points of the ellipsoid/box.
My current algorithm begins by iterating through all the points of the object. For those of you unclear on my iteration, this is my loop.
for(int i = 0; i < numberOfVertices*3;)
{
pointX = vertices[i];
i++;
pointY = vertices[i];
i++;
pointZ = vertices[i];
i++;
//consider transformations, then run algorithm
}
I perform the necessary steps to consider all my transformations, and after that is finished, I have a point from my object and the location of my ellipsoid/box centroid.
Then, depending on the shape, one of the following algorithms is used:
Ellipsoid: I use the centroid of the ellipse and apply the formula
(x−c)T RT A R(x−c) (sorry I don't know how to format that, I'll explain the formula). x is a column vector describing the xyz point from my object that I am on in my iteration. c is a column vector describing the xyz point of my centroid. T is supposed to mean transpose. R is my rotation matrix. A is a diagonal matrix with entries with entries (1/a^2, 1/b^2, 1/c^2), and I have values for a b and c. If this formula is > 1, then x lies outside of my ellipsoid and is not a valid point. If it is <=1, then I save x.
Box: I simply check if the point falls within a range. If the point of the object lies a certain distance in the X-direction, Y-direction, and Z-direction from the centroid, I save it.
These algorithms are accurate, and work as intended. The issue, is obviously efficiency. I don't seem to have a good understanding of what makes my app strain and what doesn't. I thought multi-threading would work, and I tried some of the techniques described but they didn't have a significant improvement on performance. If anyone has ideas on filtering out my search so I'm not iterating through all these points, it would help.
May I suggest a slightly different way to handle it. I know this isn't a direct answer to your question, but please consider it.
This could be easier to see if you implemented it as coordinate Objects, each with x, y and z values. Your "array" would now be 1/3 as long. You might think this would be less efficient--and you might be right--but you'd be surprised at how well java can optimize things. Often Java optimizes for the cases people use the most and your manually manipulating this array as you suggest is possibly even slower than using objects. Until you've proven the most readable design too slow you shouldn't optimize it.
Now you have a collection of coordinate objects. Java has queues that multiple threads can pull from efficiently. Dump all your objects into a queue and have each of your threads pull one and work on it by processing it and putting it in a "Completed" queue. Note that this gives you the ability to add or remove threads easily, without effecting your code except for 1 number. How would you take the array based solution to 4 or 6 threads?
Good luck
Here is a demo of the work explained below.
Observations
Each coordinate is 3 indexes.
You have 3 threads.
Let's say you have 17 coordinates, that's 51 indexes. You want to split the 17 coordinates among your 3 threads.
var arraySize = 51;
var numberOfThreads = 3;
var numberOfIndexesPerCoordinate = 3;
var numberOfCoordinates = arraySize / numberOfIndexesPerCoordinate; //17 coordinates
Now split that 17 coordinates among your threads.
var coordinatesPerThread = numberOfCoordinates / numberOfThreads; //5.6667
This isn't an even number, so you need to distribute unevenly. We can use Math.floor and modulo to distribute.
var floored = Math.floor(coordinatesPerThread); //5 - every thread gets at least 5.
var modulod = numberOfCoordinates % floored; // 2 - there will be 2 left that need to be placed sequentially into your thread pool
This should give you all the information you need. Without knowing what language you are using, I don't want to give any real code samples.
I see you edited your question to specify Java as your language. I'm not going to do the threading work for you, but I'll give a rough idea.
float[] coordinates = new float[17 * 3]; //17 coordinates with 3 indexes each.
int numberOfThreads = 3;
int numberOfIndexesPerCoordinate = 3;
int numberOfCoordinates = coordinates.length / numberOfIndexesPerCoordinate ; //coordinates * 3 indexes each = 17
//Every thread has this many coordinates
int coordinatesPerThread = Math.floor(numberOfCoordinates / numberOfThreads);
//This is the number of coordinates remaining that couldn't evenly be split.
int remainingCoordinates = numberOfCoordinates % coordinatesPerThread
//To make things easier, I'm just going to track the offset in the original array. It could probably be computed instead, but its just an int.
int offset = 0;
for (int i = 0; i < numberOfThreads; i++) {
int numberOfIndexes = coordinatesPerThread * numberOfIndexesPerCoordinate;
//If this index is one of the remainders, then increase by 1 coordinate (3 indexes).
if (i < remainingCoordinates)
numberOfIndexes += numberOfIndexesPerCoordinate ;
float[] dest = new float[numberOfIndexes];
System.arraycopy(coordinates, offset, dest, 0, numberOfIndexes);
offset += numberOfIndexes;
//Put the dest array of indexes into your threads.
}
Another, potentially better option would be to use a Concurrent Deque that has all of your coordinates, and have each thread pull from it as they need a new coordinate to work with. For this solution, you'd need to create Coordinate objects.
Declare a Coordinate object
public static class Coordinate {
protected float x;
protected float y;
protected float z;
public Coordinate(float x, float y, float z) {
this.x = x;
this.y = y;
this.z = z;
}
}
Declare a task to do your work, and pass it your concurrent deque.
public static class CoordinateTask implements Runnable {
private final Deque<Coordinate> deque;
public CoordinateTask(Deque<Coordinate> deque) {
this.deque = deque;
}
public void run() {
Coordinate coordinate;
while ((coordinate = this.deque.poll()) != null) {
//Do your processing here.
System.out.println(String.format("Proccessing coordinate <%f, %f, %f>.",
coordinate.x,
coordinate.y,
coordinate.z));
}
}
}
Here's the main method showing the example in action
public static void main(String []args){
Coordinate[] coordinates = new Coordinate[17];
for (int i = 0; i < coordinates.length; i++)
coordinates[i] = new Coordinate(i, i + 1, i + 2);
final Deque<Coordinate> deque = new ConcurrentLinkedDeque<Coordinate>(Arrays.asList(coordinates));
Thread t1 = new Thread(new CoordinateTask(deque));
Thread t2 = new Thread(new CoordinateTask(deque));
Thread t3 = new Thread(new CoordinateTask(deque));
t1.start();
t2.start();
t3.start();
}
See this demo.
Before trying to optimize with concurrency, try to minimize the amount of points you need to test, and minimize the cost of those tests, by using the most efficient collision detection methods at your disposal.
Some general suggestions:
Consider normalizing everything to a common frame of reference before running through your calculations. For example, instead of applying transformations to each point, transform the selection box/ellipsoid into the shape's coordinate system so you can perform your collision detection without the transformations within each iteration.
You may also be able to combine some or all of your transformations (rotation, translation, etc.) into a single matrix calculation, but that won't gain you much unless you're performing a lot of transformations, which you should try to avoid.
Generally speaking it's beneficial to keep the transformation pipeline as streamlined as possible, and keep all coordinate calculations in the same space to avoid transformations as much as possible.
Try to minimize the number of points you need to perform your slowest calculations on. The most accurate collision test should only be necessary for points that you can't rule out as being inside the shape by faster means, using an approximation of the shape, such as a collection of spheres, or the shape's convex hull. Simplifying the shape allows you to limit the slowest calculations to only those points that lie very close to your shape's actual bounds.
In my own 2D work in the past I found that even calculating the convex hulls for hundreds of complex animated shapes in real time was faster than doing collision detection directly without using their convex hulls, because they enable much faster collision calculations.
Consider calculating/storing additional information about the shape, such as an inner and outer collision sphere (one sphere inside all points, and one outside all points) which you can use as a fast initial filter. Anything inside the smaller sphere is guaranteed to be inside your shape, anything outside the outer sphere is known to be outside your shape. You might even want to store a simplified version of your shape, (or its convex hull), which you could calculate in advance and use to aid collision detection.
Similarly, consider using one or more spheres to approximate your ellipsoid in initial calculations, to minimize which points you need to test for collision.
Instead of calculating actual distances, calculate the squared distances and use those for comparison. However, prefer using faster tests for collision if possible. For example, for convex polygons you can use the Separating Axis Theorem, which projects vertices onto a common axis/plane to permit very quick overlap calculations.
I am writing java code to implement Principal Component Analysis. I am modeling my matrices using Apache Commons Math3's RealMatrix class.
As part of the procedure, the eigenvalues and eigenvectors of the covariance matrix are calculated using the EigenDecomposition class. This produces two matrices:
Columns of Matrix v are the eigenvectors
Matrix d is all 0's except the eigenvalues on the diagonal
Example: the original matrix is:
⎡0.6166 0.6154⎤
⎣0.6154 0.7166⎦
After decomposition, the eigenvector matrix v is
⎡-0.7352 -0.6778⎤
⎣ 0.6779 -0.7352⎦
And the eigenvalue diagonal matrix d is
⎡0.4908 0.0000⎤
⎣0.0000 1.2840⎦
The next step in the PCA procedure is to sort columns by eigenvalue (in decreasing order). In particular since the second column eigenvalue (1.284) is higher than the first column (0.4908), I want this to be first, and sort both matrices v and d so that the columns appear in the decreasing eigenvalue order:
Resulting v':
⎡-0.6778 -0.7352⎤
⎣-0.7352 0.6779⎦
Resulting d':
⎡0.0000 0.4908⎤
⎣1.2840 0.0000⎦
I have searched SO and many places for code which does this sorting, and found either packages that do PCA in a much more complex way, or manual sorting routines for 2D Java arrays. While I am capable of writing such a sorting routine, I will be doing this frequently on large arrays and am hoping for a prepackaged, efficient solution. Since PCA is a standard procedure, this matrix operation should be rather common. I am looking to see if there are any packages that already exist (e.g., Apache Commons Math) which contain methods that perform this.
An alternate solution that would permit me to reconstruct the new matrix from the old one would be to obtain an array of sort indexes from the eigenvalue columns, e.g., an array [1,0] which tells me the highest ranked eigenvalue is in column 1, and the 2nd highest ranked eigenvalue is in column 0, etc.
Can anyone point me to a package that can support this?
It looks like I've been able to implement the alternate solution I suggested. I created an array of column indices ({0, 1}) and then sorted that array based on the eigenvalue corresponding to the indexed column. Then I simply created a new RealMatrix and copied columns over from the old one in order of the sorted array:
for (index: sortedIndexArray) {
vPrime.setColumnVector(i, v.getColumnVector(index));
dPrime.setColumnVector(i, d.getColumnVector(index));
i++;
}
Still wondering if there's already a method in a package somewhere that does this...
How can I pass a MATLAB result s like shown below to a Java method JSize()
s = size(oImage)
s =
91 121 3
First off, you would need to know how many dimensions your array has. Because this looks like an image, I'm going to assume that you'll expect a 3D array.
Because Java considers multidimensional arrays as an array of arrays, it isn't as dynamic as MATLAB where you can simply figure out how many dimensions there are by just checking the length of the size vector.
Assuming that your matrix is not jagged, you can determine how many rows you have by:
int rows = oImage.length;
If you want to determine how many columns there are, you can use any of the rows in your matrix and obtain its length:
int cols = oImage[0].length;
If you want to see how many elements there are in each 2D location in your matrix, you would just access any column in any row you specify and get its length. In our case, let's stick with oImage[0]:
int dim = oImage[0][0].length;
Therefore, you could write a Java method that could return this as an array of elements similar to size in MATLAB:
public int[] JSize(int[][][] oImage) {
return new int[] {oImage.length, oImage[0].length, oImage[0][0].length};
}
Remember, Java has the capacity of declaring jagged multi-dimensional arrays. This means that each row in your 2D matrix does not necessarily have to have the same number of elements like what you would see in a matrix. If you have a multi-dimensional array in Java that follows the above model, then the above code wlll work.
Here is my case, I would like to create a matrix buffer for a 3d project I am working on.
Many people on Stack Overflow are proposing doing something like this
ArrayList<ArrayList<object>>
However this structures is causing issues as I need a fixed sized matrix and I am aware of the impact that add(i,object) as on the complexity of the operation. On the other hand my last nested level of my matrix needs to be of a variable size so if the object is at the same position it just adds it self to the stack.
If you need a matrix with a variable length 3rd dimension, why not do ArrayList[][]?
Obviously you can't instantiate a generic matrix, but you can cast it from the raw-type to Object (assuming that's what you want) like this:
ArrayList<Object>[][] box = (ArrayList<Object>[][])new ArrayList[length][width];
This would result in a fixed size matrix with a variable length 3rd dimension. Remember to fill the matrix with ArrayList's though, as the whole matrix will be filled with null to begin with.
The variable length 3rd dimension can be handled by many different collections. If your 3rd dimension truly acts like a Stack (or even a Queue/Deque) then I would use LinkedList to take care of it due to the speed with which it can add and remove objects from the front/back of the collection.
In order to create the 2D matrix of lists of type E you could write:
LinkedList<E>[][] matrix = new LinkedList[length][width];
Then right after that, I would suggest instantiating all of the lists like so in order to prevent null pointer problems:
for(int i = 0; i < matrix.length; i++)
for(int j = 0; j < matrix[0].length; j++)
matrix[i][j] = new LinkedList<>();
I did assume that you were using Java 7. If not, simply put the type (E) into the angle brackets when instantiating each element. I hope this helps, and have fun coding! =)