I have:
final int ROWS = 100000;
final int COLS = 2000;
long[][] m = new long[COLS][ROWS];
and then:
public void xor(int row1, int row2) {
for (int col=0; col<COLS; col++) {
m[col][row1] ^= m[col][row2];
}
}
The above function is, simplified, what takes most of the time in a run. I was wondering if I should invest time to refactor my whole program to read "m = new long[ROWS][COLS]" (instead of the other way around) for better RAM access. Or won't I win much time with it?
I am aware that I could parallellize it with perhaps GPU's, but that's for a later stage.
In my opinion, it will definitely help to swap ROWS and COLS.
The layout of this arrays is (roughly) like this: [0][0], [0][1], [0][2],... [1][0], [1][1],... and so on. In your code, each column is a continuous chunk of memory, and a row is not.
Since each column is 800000 bytes, and in your xor method you access all of them, you are forcing more cache misses.
After transposing, each row becomes a continuous piece of memory, and since you tend to operate on rows, it should make it faster.
If you had long[][] m = new long[ROWS][COLS]; and for (int col=0; col<COLS; col++) m[row1][col] ^= m[row2][col];, you'd only need two 16000-byte-long rows to be in the cache during the execution of the xor method.
But since what I said is based mostly on theory, try to benchmark both variants and check which one is really faster.
Related
I am trying to full one dimensional array in two dimensional array in Java.
I did it using this way, is there another way better than this?
public double[][] getResult(double[] data, int rowSize) {
int columnSize = data.length;
double[][] result = new double[columnSize][rowSize];
for (int i = 0; i < columnSize; i++) {
result[i][0] = data[i];
}
return result;
}
Edit: i am not going to reuse data-Array i want to set the reference of first column in result-array to the data-array .is it possible? if yes, how can i do this ?
I don't know if this makes sense in your application/context, but for performance reasons you should normally have coherent data as close in the memory as possible.
For Arrays (double[row][column]) that means that data which is has the same row value is normally stored very close in the memory. For example when you have an Array like double[][] d = double[10][10] and save two values like d[0][0] = 1.0; d[1][0] = 2.0;, then they are (I think) 40 bytes away from each other.
But when you save them like d[0][0] = 1.0; d[0][1] = 2.0;, they're directly next to each other in the memory and are normally loaded into the super-fast cache at the same time. When you want to use them iteratively, then that is a huge performance gain.
So for your application, a first improvement would be this:
public double[][] getResult(double[] data, int rowSize) {
int columnSize = data.length;
double[][] result = new double[rowSize][columnSize];
for (int i = 0; i < columnSize; i++) {
result[0][i] = data[i];
}
return result;
}
Secondly, you have to consider whether you are going to re-use the data-Array or not. Because if you don't re-use it, you can just set the reference of the first row in your result Array to the data-Array like this:
result[0] = data;
Again, this gives you another huge performance gain.
Why? Well simply because you don't need to use unnecessary much memory by copying all values - and as you might know, memory allocation can be rather slow.
So always re-use memory instead of copying it wildly.
So my full suggestion is to go with this code:
public double[][] getResult(double[] data, int rowSize) {
double[][] result = new double[rowSize][];
result[0] = data;
return result;
}
It is optional whether you set the columnSize or not.
if I have a for loop like...
for (int i = 0; i < myArray.length; i++) { ... }
...does myArray.lengthget evaluated every iteration? So would something like...
int len = myArray.length;
for (int i = 0; i < len; i++) { ... }
... be a small performance increase?
regardless myArray.length is just a field so there is nothing to evaluate
Java array has length as public final int so it gets initialized once and when you refer to it there is no code execution like a method call
The public final field length, which contains the number of components of the array. length may be positive or zero.
The first form will probably incur some performance penalty, since evaluating it will require, before the iflt, an aload, an arraylength and an iload; whereas the second is only two iloads.
#ajp rightly mentions that myArray may change; so it is highly unlikely that the compiler will optimize the first form into the second for you (unless, maybe, myArray is final).
However, the JIT, when it kicks in, is probably smart enough so that, if myArray doesn't change, it will turn the first form into the second.
Just in case, anyway, use the second form (this is what I always do, but that's just out of habit). Note that you can always javap the generated class file to see the generated byte code and compare.
By the way, Wikipedia has a very handy page listing all of a JVM's bytecodes. As you may see, quite a lot of them are dedicated to arrays!
Yes, the termination expression gets evaluated every time. So you're right that storing the length once could be a small performance increase. But more importantly, it changes the logic, which could make a difference if myArray gets reassigned.
for (int i = 0; i < myArray.length; i++) {
if (something-something-something) {
myArray = appendToMyArray(myArray, value); // sets myArray to a new, larger array
}
}
Now it makes a big difference whether you store the array length in a variable first.
You wouldn't normally see code like this with an array. But with an arrayList or other collection, whose size could increase (or decrease) in the body of the loop, it makes a big difference whether you compute the size once or every time. This idiom shows up in algorithms where you keep a "To-Do list". For example, here's a partial algorithm to find everyone who's connected directly or indirectly to some person:
ArrayList<Person> listToCheck = new ArrayList<>(KevinBacon);
for (int i = 0; i < listToCheck.size(); i++) {
List<Person> connections = allConnections(listToCheck.get(i));
for (Person p : connections) {
if ([p has not already been checked]) {
listToCheck.add(p); // increases listToCheck.size()!!!
}
}
}
Not really. Both cases are comparing the value at two memory addresses with every iteration, except you are doing unnecessary assigning when you use a len variable. The performance difference is probably very small, and the first line is more readable, so I would use the first way as it is more readable. If you want to be even more readable and efficient, use a for-each loop if you are just going to do a linear iteration through your array. For-each loops look work like this:
int [] myArray = {1,2,3};
for(int i:myArray){
System.out.print(i);
}
will print:
1
2
3
as i is set to each element of the array. The for each loop can be used for many objects, and is a nice feature to learn.
Here is a guide explaining it.
https://www.udemy.com/blog/for-each-loop-java/
I'm developing a image processing app in Java (Swing), which have lots of calculations.
It crashes when big images are loaded:
java.lang.OutOfMemoryError: Java heap space due things like:
double matrizAdj[][] = new double[18658][18658];
So I'm decided to experiment a light, and faster as possible, database to deal with this problem. Thinking to use a table as it were a 2D array, loop throught it insert resulting values into other table.
I'm also thinking about using JNI, but as I'm not familiarized with C/C++ and I don't have the time needed to learn.
Currently, my problem is not processing, only heap overload.
I would like to hear what is my best option to solve this.
EDIT :
Little explanation: First I get all white pixels from a binarized image into a list. Lets say I got 18k pixels. Then I perform a lot of operations with that list. Like variance, standard deviation, covariance... and goes on... At the end I have to multiply two 2D array([2][18000] & [18000][2]) resulting in a double[18000][18000] that is causing me trouble. After that, other operations are done with this 2D array, resulting in more than one big 2D array.
I can't deal with requiring large ammounts of RAM to use this app.
Well, for trivia's sake, that matrix you're showing consumes roughly 2.6Gb of RAM. So, that's a benchmark of how much memory you need should you decided to pursue that tact.
If it's efficient for you, you could store the rows of the matrix in to blobs within a database. In this case you'd have 18658 rows, with a serialized double[18658] store on it.
I wouldn't suggest that though.
A better tact would be to use the image file directly, and look at NIO and byte buffers to use mmap to map them in to your program space.
Then you can use things like DoubleBuffers to access the data. This lets the VM page in as much of the original file is necessary, and it also keeps the data off the Java heap (rather it's stored in process RAM associated with the JVM). The big benefit is that it keeps these monster data structures away from the Garbage Collector.
You'll still need physical RAM on the machine, of course, but it's not Java Heap RAM.
But this is would likely be the most efficient way to access this data for your process.
Since you stated you "can't deal with requiring large ammounts of RAM to use this app" your only option is to store the big array off RAM - disk being the most obvious choice (using a relational database is just an unnecessary overhead).
You can use a little utility class which provides a persistent 2-dimensional double array functionality. Here is my solution to that using RandomAccessFile. This solution also has the advantage that you can keep the array and reuse it when you restart the application!
Note: the presented solution is not thread-safe. Synchronization needed if you want to access it from multiple threads concurrently.
Persistent 2-dimensional double array:
public class FileDoubleMatrix implements Closeable {
private final int rows;
private final int cols;
private final long rowSize;
private final RandomAccessFile raf;
public FileDoubleMatrix(File f, int rows, int cols) throws IOException {
if (rows < 0 || cols < 0)
throw new IllegalArgumentException(
"Rows and cols cannot be negative!");
this.rows = rows;
this.cols = cols;
rowSize = cols * 8;
raf = new RandomAccessFile(f, "rw");
raf.setLength(rowSize * cols);
}
/**
* Absolute get method.
*/
public double get(int row, int col) throws IOException {
pos(row, col);
return get();
}
/**
* Absolute set method.
*/
public void set(int row, int col, double value) throws IOException {
pos(row, col);
set(value);
}
public void pos(int row, int col) throws IOException {
if (row < 0 || col < 0 || row >= rows || col >= cols)
throw new IllegalArgumentException("Invalid row or col!");
raf.seek(row * rowSize + col * 8);
}
/**
* Relative get method. Useful if you want to go though the whole array or
* though a continuous part, use {#link #pos(int, int)} to position.
*/
public double get() throws IOException {
return raf.readDouble();
}
/**
* Relative set method. Useful if you want to go though the whole array or
* though a continuous part, use {#link #pos(int, int)} to position.
*/
public void set(double value) throws IOException {
raf.writeDouble(value);
}
public int getRows() { return rows; }
public int getCols() { return cols; }
#Override
public void close() throws IOException {
raf.close();
}
}
The presented FileDoubleMatrix supports relative get() and set() methods which is very useful if you process your whole array or a continuous part of it (e.g. you iterate over it). Use the relative methods when you can for faster operations.
Example using the FileDoubleMatrix:
final int rows = 10;
final int cols = 10;
try (FileDoubleMatrix arr = new FileDoubleMatrix(
new File("array.dat"), rows, cols)) {
System.out.println("BEFORE:");
for (int row = 0; row < rows; row++) {
for (int col = 0; col < cols; col++) {
System.out.print(arr.get(row, col) + " ");
}
System.out.println();
}
// Process array; here we increment the values
for (int row = 0; row < rows; row++)
for (int col = 0; col < cols; col++)
arr.set(row, col, arr.get(row, col) + (row * cols + col));
System.out.println("\nAFTER:");
for (int row = 0; row < rows; row++) {
for (int col = 0; col < cols; col++)
System.out.print(arr.get(row, col) + " ");
System.out.println();
}
} catch (IOException e) {
e.printStackTrace();
}
More about the relative get and set methods:
The absolute get and set methods require the position (row and column) of the element to be returned or set. The relative get and set methods do not require the position, they return or set the current element. The current element is in fact the pointer of the underlying file. The position can be set with the pos() method.
Whenever a relative get() or set() method is called, after returning they implicitly move the pointer to the next element, in a row-continuity manner (moving to the next in the row, and if the end of row reached, moving to the first element of the next row etc.)
For example here is how we can zero the whole array using the relative set method:
// Fill the whole array with zeros using relative set
// First position to the beginning:
arr.pos(0, 0);
// And execute a "set zero" operation
// as many times as many elements the array has:
for ( int i = rows * cols; i > 0; i--)
arr.set(0);
The relative get and set methods automatically move the pointer to the next element.
It should be obvious that in my implementation the absolute get and set methods also change the pointer which must not be forgotten when relative and absolute get/set methods are used.
Another example: let's set the sum of each row to the last element of the row, but also include the last element in the sum! For this we will use the mixture of realtive and absolute get/set methods:
// Start with the first row:
arr.pos(0, 0);
for (int row = 0; row < rows; row++) {
double sum = 0;
for (int col = 0; col < cols; col++)
sum += arr.get(); // Relative get to calculate row sum
// Now set the sum to the end of row.
// For this we have to position back, so we use the absolute set.
arr.set(row, cols - 1, sum);
// The absolute set method also moves the pointer, and since
// it is the end of row, it moves to the first of the next row.
}
And that's all. Using the relative get/set methods we don't have to pass the "matrix indices" when processing continuous parts of the array, and also the implementation does not have to move the internal pointer which is more than handy when processing millions of elements as in your example.
I would recommend the following things in order.
Investigate why your app is running out of memory. Are you creating arrays or other objects bigger than what you need. I hope you might have done that already. But still I thought it's worth mentioning because this should not be ignored.
If you think there is nothing wrong with step 1 then check you are not running with too low memory settings. or 32 bit jvm
If there is no issue with step 2. Now it's not always true that a light weight database will give you best performance. If you don't require searching the temp data may be you won't gain much from implementing a light weight database. But if your application needs lot of searching / querying the temp data it may be different case. If you don't need searching custom file format may be fast and efficient.
I hope it helps you solve the issue at hand :)
The simplest fix would be simply to give your program more memory. For example, if you specify -xmx 11G on your Java command line, the JVM will be able to allocate up to 11 GB of heap space - enough memory to carry several copies of your array, which is around 2.6 GB in size, in memory at a time.
If speed is really not an issue, you can do this even if you don't have enough physical memory, by allocating enough virtual memory and letting the OS swap the memory to disk.
I personally also think this is the best solution. Memory on this scale is cheaper than programmer time.
I would suggest a different approach.
Since most image processing operations are done by going over all of the pixels in some order exactly once, it's usually possible to do them on one piece of the image at a time. What I mean is that there's usually no random access to pixels of the image. If I'm not mistaking, all of the operations you mention in your question fit this description.
Therefore, I would suggest loading the image lazily, a piece at a time. Then, implement methods that retrieve the next chunk of pixels once the previous one is processed, and feeds these chunks to the algorithms you use.
In order to support that, I would suggest converting the images to a non compressed format that you could create a lazy reader for easily.
Not sure I would bother with a database for this, just open a temporary file and spill parts of your matrix in there as needed, and delete the file when you're done. Whatever solution you choose has to depend somewhat on your matrix library being able to use it. If you're using a third party library then you're probably limited to whatever options (if any) they provide. However if you've implemented your own matrix operations then definitely would just go with a temporary file that I manage myself. That will be fastest and lightest weight.
You can use split and reduce technique.
split your image into small fragments, or you can use sliding window technique
http://forums.ni.com/t5/Machine-Vision/sliding-window-technique/td-p/2586621
cheers,
I just ran into an issue while trying to write an bitmap-manipulating algo for an android device.
I have a 1680x128 pixel Bitmap and need to apply a filter on it. But this very simple code-piece actually took almost 15-20 seconds to run on my Android device (xperia ray with a 1Ghz processor).
So I tried to find the bottleneck and reduced as many code lines as possible and ended up with the loop itself, which took almost the same time to run.
for (int j = 0; j < 128; j++) {
for (int i = 0; i < 1680; i++) {
Double test = Math.random();
}
}
Is it normal for such a device taking so much time in a simple for-loop with no difficult operations?
I'm very new to programming on mobile devices so please excuse if this question may be stupid.
UPDATE: Got it faster now with some simpler operations.
But back to my main problem:
public static void filterImage(Bitmap img, FilterStrategy filter) {
img.prepareToDraw();
int height = img.getHeight();
int width = img.getWidth();
RGB rgb;
for (int j = 0; j < height; j++) {
for (int i = 0; i < width; i++) {
rgb = new RGB(img.getPixel(i, j));
if (filter.isBlack(rgb)) {
img.setPixel(i, j, 0);
} else
img.setPixel(i, j, 0xffffffff);
}
}
return;
}
The code above is what I really need to run faster on the device. (nearly immediate)
Do you see any optimizing potential in it?
RGB is only a class that calculates the red, green and blue value and the filter simply returns true if all three color parts are below 100 or any othe specified value.
Already the loop around img.getPixel(i,j) or setPixel takes 20 or more seconds. Is this such an expensive operation?
It may be because too many Objects of type Double being created.. thus it increase heap size and device starts freezing..
A way around is
double[] arr = new double[128]
for (int j = 0; j < 128; j++) {
for (int i = 0; i < 1680; i++) {
arr[i] = Math.random();
}
}
First of all Stephen C makes a good argument: Try to avoid creating a bunch of RGB-objects.
Second of all, you can make a huge improvement by replacing your relatively expensive calls to getPixel with a single call to getPixels
I made some quick testing and managed to cut to runtime to about 10%. Try it out. This was the code I used:
int[] pixels = new int[height * width];
img.getPixels(pixels, 0, width, 0, 0, width, height);
for(int pixel:pixels) {
// check the pixel
}
There is a disclaimer in the docs below for random that might be affecting performance, try creating an instance yourself rather than using the static version, I have highlighted the performance disclaimer in bold:
Returns a pseudo-random double n, where n >= 0.0 && n < 1.0. This method reuses a single instance of Random. This method is thread-safe because access to the Random is synchronized, but this harms scalability. Applications may find a performance benefit from allocating a Random for each of their threads.
Try creating your own random as a static field of your class to avoid synchronized access:
private static Random random = new Random();
Then use it as follows:
double r = random.nextDouble();
also consider using float (random.nextFloat()) if you do not need double precision.
RGB is only a class that calculates the red, green and blue value and the filter simply returns true if all three color parts are below 100 or any othe specified value.
One problem is that you are creating height * width instances of the RGB class, simply to test whether a single pizel is black. Replace that method with a static method call that takes the pixel to be tested as an argument.
More generally, if you don't know why some piece of code is slow ... profile it. In this case, the profiler would tell you that a significant amount of time is spent in the RGB constructor. And the memory profiler would tell you that large numbers of RGB objects are being created and garbage collected.
I've written the following which does what I want it to do, but I think there's a faster way than doing a conditional statement in the for loop.
int i,j,k,l;
int[] X = new int[] {111,222,333,444};
int XL = X.length;
for (i=0, j=1; i<XL; j=(j<XL-1)?j+1:0, i++) {
println("i:" +i +" j:" + j);
}
// returns:
// i:0 j:1
// i:1 j:2
// i:2 j:3
// i:3 j:0
Taking a different angle on the problem than just saying "do x to make the code 50% faster", how have you tested the code and how have you determined that it's too slow?
Java's JIT compiler these days is very, very good at what it does, making these sorts of micro optimisations so you don't have to. If you start doing ridiculous amounts of low level optimisations and obfuscating your code somewhat silly:
You may or may not achieve a small speed increase
You will make your code near unmaintainable and difficult to read
You may trick the JIT compiler into not making optimisations it would have done otherwise (since it understands common Java idioms much more than obfuscated code.)
If you still, definitely, and unavoidably need every last speed increase from your application then the best thing to do is just write these bits in assembler. If you're just trying to make things a fraction faster for the sake of it though, the best real world advice is almost always "don't bother".
Kerrek SB's comment is my preferred solution.
Here it is rewritten to a bit more generic:
String[] X = new String[] {"AAA", "BBB", "CCC", "DDD"};
int XL = X.length;
for (int i=0; i<XL; i++) {
println("i:" +X[i] +" j:" + X[(i+1)%XL] );
}
will return:
i:AAA j:BBB
i:BBB j:CCC
i:CCC j:DDD
i:DDD j:AAA
First, your for loop is not all that confusing as it is and you could add some comments to make it more understandable.
If you wanted to rewrite it with readability in mind, you could do this with one indexor.
int i;
int[] X = new int[] {111,222,333,444};
for (i=0; i < X.length; i++) {
println("i:" +i +" j:" + (i < X.lenght -1) ? i + 1 : 0);
}