for-loop very slow on Android device - java

I just ran into an issue while trying to write an bitmap-manipulating algo for an android device.
I have a 1680x128 pixel Bitmap and need to apply a filter on it. But this very simple code-piece actually took almost 15-20 seconds to run on my Android device (xperia ray with a 1Ghz processor).
So I tried to find the bottleneck and reduced as many code lines as possible and ended up with the loop itself, which took almost the same time to run.
for (int j = 0; j < 128; j++) {
for (int i = 0; i < 1680; i++) {
Double test = Math.random();
}
}
Is it normal for such a device taking so much time in a simple for-loop with no difficult operations?
I'm very new to programming on mobile devices so please excuse if this question may be stupid.
UPDATE: Got it faster now with some simpler operations.
But back to my main problem:
public static void filterImage(Bitmap img, FilterStrategy filter) {
img.prepareToDraw();
int height = img.getHeight();
int width = img.getWidth();
RGB rgb;
for (int j = 0; j < height; j++) {
for (int i = 0; i < width; i++) {
rgb = new RGB(img.getPixel(i, j));
if (filter.isBlack(rgb)) {
img.setPixel(i, j, 0);
} else
img.setPixel(i, j, 0xffffffff);
}
}
return;
}
The code above is what I really need to run faster on the device. (nearly immediate)
Do you see any optimizing potential in it?
RGB is only a class that calculates the red, green and blue value and the filter simply returns true if all three color parts are below 100 or any othe specified value.
Already the loop around img.getPixel(i,j) or setPixel takes 20 or more seconds. Is this such an expensive operation?

It may be because too many Objects of type Double being created.. thus it increase heap size and device starts freezing..
A way around is
double[] arr = new double[128]
for (int j = 0; j < 128; j++) {
for (int i = 0; i < 1680; i++) {
arr[i] = Math.random();
}
}

First of all Stephen C makes a good argument: Try to avoid creating a bunch of RGB-objects.
Second of all, you can make a huge improvement by replacing your relatively expensive calls to getPixel with a single call to getPixels
I made some quick testing and managed to cut to runtime to about 10%. Try it out. This was the code I used:
int[] pixels = new int[height * width];
img.getPixels(pixels, 0, width, 0, 0, width, height);
for(int pixel:pixels) {
// check the pixel
}

There is a disclaimer in the docs below for random that might be affecting performance, try creating an instance yourself rather than using the static version, I have highlighted the performance disclaimer in bold:
Returns a pseudo-random double n, where n >= 0.0 && n < 1.0. This method reuses a single instance of Random. This method is thread-safe because access to the Random is synchronized, but this harms scalability. Applications may find a performance benefit from allocating a Random for each of their threads.
Try creating your own random as a static field of your class to avoid synchronized access:
private static Random random = new Random();
Then use it as follows:
double r = random.nextDouble();
also consider using float (random.nextFloat()) if you do not need double precision.

RGB is only a class that calculates the red, green and blue value and the filter simply returns true if all three color parts are below 100 or any othe specified value.
One problem is that you are creating height * width instances of the RGB class, simply to test whether a single pizel is black. Replace that method with a static method call that takes the pixel to be tested as an argument.
More generally, if you don't know why some piece of code is slow ... profile it. In this case, the profiler would tell you that a significant amount of time is spent in the RGB constructor. And the memory profiler would tell you that large numbers of RGB objects are being created and garbage collected.

Related

How can I process BufferedImage faster

I'm making a basic image editor to improve my image process skills. I have 12 filters(for now).
All filters have a clickable JLabel that has image
I update images of all of them when all filters apply with this function:
public static void buttonImagesUpdater(){
for(int i = 0; i < effects.size(); i++){
effects.get(i).getButton().setImage(new ImageIcon(effects.get(i).process(image)));
}
}
All filters have a process function like that:
public BufferedImage process(BufferedImage base) {
BufferedImage product = new BufferedImage(base.getWidth(), base.getHeight(), base.getType());
for(int indisY = 0; indisY < base.getHeight(); indisY++){
for(int indisX = 0; indisX < base.getWidth(); indisX++){
Color currentColor = new Color(base.getRGB(indisX, indisY));
int greyTone = 0;
greyTone = (int) (currentColor.getRed()*0.315) +
(int) (currentColor.getGreen()*0.215)
+ (int) (currentColor.getBlue()*0.111);
product.setRGB(indisX, indisY, new Color(greyTone,greyTone,greyTone).getRGB());
}
}
return product;
}
Program works so slowly. When I click an effect's button it done 45 second later when I use 5000x3000 image. How can I fix this performance problem?
You have got to remember that 3000 * 5000 is 15,000,000 so you're creating 15,000,000 Color objects, you're calling setRGB 15,000,000 times. If I were you, I would look into potentially using ForkJoinPool for this.
I agree with #Jason - the problem is that you're creating (and destroying) 15 million Color objects.
However, I don't think that just using multiple threads is going to get you enough of a performance increase, because you are still going to be putting a lot of pressure on memory and the garbage collector, since you'll still be creating and destroying 15 million objects, you'll just be doing several in parallel.
I think that you can both stay away from creating Color objects entirely, and make fewer loops, by using the result of the BufferedImage class' getRGB() method directly, instead of creating a Color object. Further, you can use the overload of getRGB() that returns an array of ints, to get, say, a row of pixels (or more) at a time, to reduce the number of calls that you have to make inside the loop. You can similarly use the version of setRGB() that takes an array of pixel values.
The trick is to be able to convert the int color value to a gray value (or whatever else you need to do) without separating the R, G, and B values, or finding an efficient way to separate R, G, and B - more efficient than creating, using, and destroying a Color object.
For a lead on getting R, G, and B values from the int returned by getRGB(), note that the documentation for Color.getRGB() says,
"Returns the RGB value representing the color in the default sRGB
ColorModel. (Bits 24-31 are alpha, 16-23 are red, 8-15 are green, 0-7
are blue)."
Once you have that working, you can think about parallelizing it.
You might try this to see if things speed up a little.
This uses a DataBuffer from an Image Raster.
And uses a map to retain previous converted colors. It may help over a period of time depending on the type of image.
And works with doubles since the data buffer supports various types.
I also multiplied your values by powers of 2 to shift them in the proper position. The get* methods of Color return values between 0 and 255 inclusive. The RGB occupy lower 24 bits on an int (Alpha is in the left most byte).
All I see is a dark image but I tested this with other parameters and I know it works. The long pole in the tent seems to be reading and writing the images. I was using a 6637 3787 image and could read, alter, and write it in 12 seconds. For more advanced processing you may want to check on AffineTransformOp.
static Map<Color, Double> colorMap = new HashMap<>();
public BufferedImage process(BufferedImage base) {
DataBuffer db = base.getRaster().getDataBuffer();
for (int i = 0; i < db.getSize(); i++) {
Color currentColor = new Color(db.getElem(i));
double greyTone = colorMap.computeIfAbsent(currentColor, v->
currentColor.getRed() * .315 *256*256
+ currentColor.getGreen() *.215 * 256
+ currentColor.getBlue()*.115);
db.setElemDouble(i, greyTone);
}
return base;
}

JAVA - Out Of Memory - Voxel World Generation

Currently I have this for code and my game either uses way to much memory when generating (over a GB) or if I set it low, it will give a
WORLD_SIZE_X & WORLD_SIZE_Z = 256;
WORLD_SIZE_Y = 128;
Does anyone know how I could improve this so it doesn't use so much RAM?
Thanks! :)
public void generate() {
for(int xP = 0; xP < WORLD_SIZE_X; xP++) {
for(int zP = 0; zP < WORLD_SIZE_Z; zP++) {
for(int yP = 0; yP < WORLD_SIZE_Y; yP++) {
try {
blocks[xP][yP][zP] = new BlockAir();
if(yP == 4) {
blocks[xP][yP][zP] = new BlockGrass();
}
if(yP < 4) {
blocks[xP][yP][zP] = new BlockDirt();
}
if(yP == 0) {
blocks[xP][yP][zP] = new BlockUnbreakable();
}
} catch(Exception e) {}
}
//Tree Generation :D
Random rX = new Random();
Random rZ = new Random();
if(rX.nextInt(WORLD_SIZE_X) < WORLD_SIZE_X / 6 && rZ.nextInt(WORLD_SIZE_Z) < WORLD_SIZE_Z / 6) {
for(int j = 0; j < 5; j++) {
blocks[xP][5 + j][zP] = new BlockLog();
}
}
}
}
generated = true;
}
Delay object creation until you really need to access one of these voxels. You can write a method (I'm assuming Block as the common subclass of all the Block classes):
Block getBlockAt( int x, int y, int z )
using code similar what you have in your threefold loop, plus using a hash map Map<Integer,Block> for storing the random stuff, e.g. trees: from x, y and z compute an integer (x*128 + y)*256 + z and use this as the key.
Also, consider that for all "air", "log", "dirt" blocks you may not need a separate object unless something must be changed at a certain block. Until then, share a single object of a kind.
Cause you just give small piece of code, I can give you two suggestions:
compact the object size. Seems very stupid but very easy to do. Just imagine you have thousands of objects in your memory. If everyone can be compacted half size, you can save half memory :).
Just assign the value to array when you need it. Sometime it is not work if you need really need a assigned array. So just assign values to elements in array as LESS as you can. If you can show me more code, I can help you more.
Are you sure the problem is in this method? Unless Block objects are really big, 256*256*128 ~= 8M objects should not require 1 GB ...
That said, if the blocks do not hold state, it would be more memory efficient to use an enum (or even a byte instead), as we would not need a separate object for each block:
enum Block {
air, grass, dirt, log, unbreakable;
}
Block[][][] map = ...

Coefficient Correlation Over a Large Binary Image Data-Set - Slow Performance

I am trying to build an OCR by calculating the Coefficient Correlation between characters extracted from an image with every character I have pre-stored in a database. My implementation is based on Java and pre-stored characters are loaded into an ArrayList upon the beginning of the application, i.e.
ArrayList<byte []> storedCharacters, extractedCharacters;
storedCharacters = load_all_characters_from_database();
extractedCharacters = extract_characters_from_image();
// Calculate the coefficent between every extracted character
// and every character in database.
double maxCorr = -1;
for(byte [] extractedCharacter : extractedCharacters)
for(byte [] storedCharacter : storedCharactes)
{
corr = findCorrelation(extractedCharacter, storedCharacter)
if (corr > maxCorr)
maxCorr = corr;
}
...
...
public double findCorrelation(byte [] extractedCharacter, byte [] storedCharacter)
{
double mag1, mag2, corr = 0;
for(int i=0; i < extractedCharacter.length; i++)
{
mag1 += extractedCharacter[i] * extractedCharacter[i];
mag2 += storedCharacter[i] * storedCharacter[i];
corr += extractedCharacter[i] * storedCharacter[i];
} // for
corr /= Math.sqrt(mag1*mag2);
return corr;
}
The number of extractedCharacters are around 100-150 per image but the database has 15600 stored binary characters. Checking the coefficient correlation between every extracted character and every stored character has an impact on the performance as it needs around 15-20 seconds to complete for every image, with an Intel i5 CPU.
Is there a way to improve the speed of this program, or suggesting another path of building this bringing similar results. (The results produced by comparing every character with such a large dataset is quite good).
Thank you in advance
UPDATE 1
public static void run() {
ArrayList<byte []> storedCharacters, extractedCharacters;
storedCharacters = load_all_characters_from_database();
extractedCharacters = extract_characters_from_image();
// Calculate the coefficent between every extracted character
// and every character in database.
computeNorms(charComps, extractedCharacters);
double maxCorr = -1;
for(byte [] extractedCharacter : extractedCharacters)
for(byte [] storedCharacter : storedCharactes)
{
corr = findCorrelation(extractedCharacter, storedCharacter)
if (corr > maxCorr)
maxCorr = corr;
}
}
}
private static double[] storedNorms;
private static double[] extractedNorms;
// Correlation between to binary images
public static double findCorrelation(byte[] arr1, byte[] arr2, int strCharIndex, int extCharNo){
final int dotProduct = dotProduct(arr1, arr2);
final double corr = dotProduct * storedNorms[strCharIndex] * extractedNorms[extCharNo];
return corr;
}
public static void computeNorms(ArrayList<byte[]> storedCharacters, ArrayList<byte[]> extractedCharacters) {
storedNorms = computeInvNorms(storedCharacters);
extractedNorms = computeInvNorms(extractedCharacters);
}
private static double[] computeInvNorms(List<byte []> a) {
final double[] result = new double[a.size()];
for (int i=0; i < result.length; ++i)
result[i] = 1 / Math.sqrt(dotProduct(a.get(i), a.get(i)));
return result;
}
private static int dotProduct(byte[] arr1, byte[] arr2) {
int dotProduct = 0;
for(int i = 0; i< arr1.length; i++)
dotProduct += arr1[i] * arr2[i];
return dotProduct;
}
Nowadays, it's hard to find a CPU with a single core (even in mobiles). As the tasks are nicely separated, you can do it with a few lines only. So I'd go for it, though the gain is limited.
In case you really mean cross-correlation, then a transform like DFT or DCT could help. They surely do for big images, but with yours 12x16, I'm not sure.
Maybe you mean just a dot product? And maybe you should tell us?
Note that you actually don't need to compute the correlation, most of the time you only need is find out if it's bigger than a threshold:
corr = findCorrelation(extractedCharacter, storedCharacter)
..... more code to check if this is the best match ......
This may lead to some optimizations or not, depending on how the images look like.
Note also that a simple low level optimization can give you nearly a factor of 4 as in this question of mine. Maybe you really should tell us what you're doing?
UPDATE 1
I guess that due to the computation of three products in the loop, there's enough instruction level parallelism, so a manual loop unrolling like in my above question is not necessary.
However, I see that those three products get computed some 100 * 15600 times, while only one of them depends on both extractedCharacter and storedCharacter. So you can compute
100 + 15600 + 100 * 15600
dot products instead of
3 * 100 * 15600
This way you may get a factor of three pretty easily.
Or not. After this step there's a single sum computed in the relevant step and the problem linked above applies. And so does its solution (unrolling manually).
Factor 5.2
While byte[] is nicely compact, the computation involves extending them to ints, which costs some time as my benchmark shows. Converting the byte[]s to int[]s before all the correlations gets computed saves time. Even better is to make use of the fact that this conversion for storedCharacters can be done beforehand.
Manual loop unrolling twice helps but unrolling more doesn't.

Avoid overmodulation/distorsion when applying gain to PCM

I work on an audio recorder (AudioRec on Google Play).
I have the option to adjust the gain with [-20dB, + 20dB] range.
It works pretty well on my phone, but an user using a professional microphone attached to his device had complained about the gain because when selecting -20dB, the output is distorted.
See below how I impl. gain function:
for(int frameIndex=0; frameIndex<numFrames; frameIndex++){
for(int c=0; c<nChannels; c++){
if(rGain != 1){
// gain
long accumulator=0;
for(int b=0; b<bytesPerSample; b++){
accumulator+=((long)(source[byteIndex++]&0xFF))<<(b*8+emptySpace);
}
double sample = ((double)accumulator/(double)Long.MAX_VALUE);
sample *= rGain;
int intValue = (int)((double)sample*(double)Integer.MAX_VALUE);
for(int i=0; i<bytesPerSample; i++){
source[i+byteIndex2]=(byte)(intValue >>> ((i+2)*8) & 0xff);
}
byteIndex2 += bytesPerSample;
}
}//end for(channel)
}//end for(frameIndex)
Maybe I should apply some low/high filter after samle *= rGain; ? Something like if(sample < MINIMUM_VALUE || sample > MAXIMUM_VALUE) ? in this case, please let me know what are these min max values...
Simply clipping values above a threshold will most certainly cause distortion. If you can picture a pure sine wave, as you lop the top off it will begin to resemble a square wave.
That said, if you have an input signal and you are multiplying it by a value smaller than one, there is no way that you are introducing any (significant) distortion. You need to look further back in the signal path. Perhaps clipping is occurring at the input.
I would try to simplify your logic. It appears you are using 32-bit wave form but the code is far more complex than needed. This will make it harder to work out how to avoid clipping.
IntBuffer ints = ByteBuffer.wrap(source).order(ByteBuffer.nativeOrder()).asIntBuffer();
for(int i = 0; i < ints.limit(); i++) {
int signal = ints.get(i);
double gained = signal * gain;
if (gained > Integer.MAX_VALUE) {
// do something.
} else if (gained < Integer.MIN_VALUE) {
// do something
}
ints.put(i, (int) gained);
}
A simple approach is to let the values overflow, but as you say this can result in an apparent distortion. Just clipping the data could lead to long period of effective silence.
What you may have to do is a FFT and produce a signal which increases the strength of audible frequencies as the cost of lower frequencies when the gain is too high. i.e. it is the low frequencies which result in the signal being too high or too low so you can't amplify these as much if you want to stay in bounds.

How do I know that my neural network is being trained correctly

I've written an Adaline Neural Network. Everything that I have compiles, so I know that there isn't a problem with what I've written, but how do I know that I have to algorithm correct? When I try training the network, my computer just says the application is running and it just goes. After about 2 minutes I just stopped it.
Does training normally take this long (I have 10 parameters and 669 observations)?
Do I just need to let it run longer?
Hear is my train method
public void trainNetwork()
{
int good = 0;
//train until all patterns are good.
while(good < trainingData.size())
{
for(int i=0; i< trainingData.size(); i++)
{
this.setInputNodeValues(trainingData.get(i));
adalineNode.run();
if(nodeList.get(nodeList.size()-1).getValue(Constants.NODE_VALUE) != adalineNode.getValue(Constants.NODE_VALUE))
{
adalineNode.learn();
}
else
{
good++;
}
}
}
}
And here is my learn method
public void learn()
{
Double nodeValue = value.get(Constants.NODE_VALUE);
double nodeError = nodeValue * -2.0;
error.put(Constants.NODE_ERROR, nodeError);
BaseLink link;
int count = inLinks.size();
double delta;
for(int i = 0; i < count; i++)
{
link = inLinks.get(i);
Double learningRate = value.get(Constants.LEARNING_RATE);
Double value = inLinks.get(i).getInValue(Constants.NODE_VALUE);
delta = learningRate * value * nodeError;
inLinks.get(i).updateWeight(delta);
}
}
And here is my run method
public void run()
{
double total = 0;
//find out how many input links there are
int count = inLinks.size();
for(int i = 0; i< count-1; i++)
{
//grab a specific link in sequence
BaseLink specificInLink = inLinks.get(i);
Double weightedValue = specificInLink.weightedInValue(Constants.NODE_VALUE);
total += weightedValue;
}
this.setValue(Constants.NODE_VALUE, this.transferFunction(total));
}
These functions are part of a library that I'm writing. I have the entire thing on Github here. Now that everything is written, I just don't know how I should go about actually testing to make sure that I have the training method written correctly.
I asked a similar question a few months ago.
Ten parameters with 669 observations is not a large data set. So there is probably an issue with your algorithm. There are two things you can do that will make debugging your algorithm much easier:
Print the sum of squared errors at the end of each iteration. This will help you determine if the algorithm is converging (at all), stuck at a local minimum, or just very slowly converging.
Test your code on a simple data set. Pick something easy like a two-dimensional input that you know is linearly separable. Will your algorithm learn a simple AND function of two inputs? If so, will it lean an XOR function (2 inputs, 2 hidden nodes, 2 outputs)?
You should be adding debug/test mode messages to watch if the weights are getting saturated and more converged. It is likely that good < trainingData.size() is not happening.
Based on Double nodeValue = value.get(Constants.NODE_VALUE); I assume NODE_VALUE is of type Double ? If that's the case then this line nodeList.get(nodeList.size()-1).getValue(Constants.NODE_VALUE) != adalineNode.getValue(Constants.NODE_VALUE) may not really converge exactly as it is of type double with lot of other parameters involved in obtaining its value and your convergence relies on it. Typically while training a neural network you stop when the convergence is within an acceptable error limit (not a strict equality like you are trying to check).
Hope this helps

Categories