I'm looking for a faster way to extract histogram data from an image.
I'm currently using this piece of code that needs about 1200ms for a 6mpx JPEG image:
ImageReader imageReader = (ImageReader) iter.next();
imageReader.setInput(is);
BufferedImage image = imageReader.read(0);
int height = image.getHeight();
int width = image.getWidth();
Raster raster = image.getRaster();
int[][] bins = new int[3][256];
for (int i = 0; i < width; i++)
for (int j = 0; j < height; j++) {
bins[0][raster.getSample(i, j, 0)]++;
bins[1][raster.getSample(i, j, 1)]++;
bins[2][raster.getSample(i, j, 2)]++;
}
Do you have any suggestions?
You're doing a lot of getSamples method calls and they're in turn doing calls and calls etc.
I work often with pictures and the typical trick to gain speed is to manipulate directly the underlying int[] (in this case your BufferedImage must be backed by an int[]).
The difference between accessing the int[] and doing, say, a getRGB can be gigantic. When I write gigantic, I mean by as much as two orders of magnitude (try doing a getRGB on OS X 10.4 vs int[x] and you'll see the perf gain).
Also, there's no call three times getSamples. I'd simply retrieve one int corresponding to your ARGB pixel and then bitshift to get the RGB bands (you're doing one histogram per R, G and B component right?).
You can gain access to the pixels array by doing something like this:
final int[] a = ((DataBufferInt) image.getRaster().getDataBuffer()).getData();
Also you can do what you want to do with a single loop, looping over all the pixels.
Instead of:
for ( int x = 0; x < width; x++ ) {
for ( int y = 0; y < height; y++ ) {
....
You can do:
for ( int p = 0; p < width*height; p++ ) {
Now if you want to get into weirder optimizations, not as likely to prove effective you could:
use loop unrolling (iterating over 6 million pixels is one of the rare case where it may help)
invert the loop: for ( p = width*height - 1; p >= 0; p--)
You can use getSamples(int x, int y, int w, int h, int b, double[] dArray) method
It's possible that this method have internal optimisations.
Also, you can try to swap width and height.
for (int i = 0; i < width; i++)
for (int j = 0; j < height; j++) {
}
}
And
for (int i = 0; i < height; i++)
for (int j = 0; j < width; j++) {
}
}
Between this two variants performance difference will be huge.
This is influence of the cpu cache
Related
So I've got a school project and we have to work with a couple classes our prof gave us and make our own to make an image organizer.
The first part consists of making a set of static methods to edit the images themselves as 2D arrays of Color arrays(ColorImage type).
The first first problem is making a tool to downscale an image by a factor of f(f sided square of pixels in the original becomes 1 pixel in the output), and mine works, but I think it shouldn't and I can't figure why it works, so any help is appreciated. Specifically I'm taking about the loop that averages the colours of each position in the buffer array(avgArr[][]) (line 16). I'm thinking: the value of reds blues and greens would just be overwritten for each iteration and avgColor would just get the vlaue of the last pixel it got the rgb values off of avgArr.
static ColorImage downscaleImg(ColorImage img, int f) {
ColorImage dsi = new ColorImage(img.getWidth()/f, img.getHeight()/f);
Color[][] avgArr = new Color[f][f];
int reds = 0;
int greens = 0;
int blues = 0;
for(int i = 0; i < dsi.getWidth(); i++) {
for(int j = 0; j < dsi.getHeight(); j++) {
for(int x = i*f, xc = 0; x < i*f + (f-1); x++, xc++){
for(int y = j*f, yc = 0; y < j*f + (f-1); y++, yc++) {
avgArr[xc][yc] = img.getColor(x, y);
}
}
for(int k = 0; k < f - 1; k++){
for(int w = 0; w < f - 1; w++) {
reds += avgArr[k][w].getR();
greens += avgArr[k][w].getG();
blues += avgArr[k][w].getB();
}
}
int count = f*f;
Color avgColor = new Color(reds/count, greens/count, blues/count);
dsi.setColor(i, j, avgColor);
reds = 0;
greens = 0;
blues = 0;
}
}
return dsi;
}
Thanks,
EDIT: Turns out, it was in fact just taking the colour of, the last position of avgArr that it looked at. Any suggestions to correct are welcome.
I think you can solve your problem by summing the reds/greens/blues and then dividing them by the total pixels at the end to find the average:
int reds = 0;
int greens = 0;
int blues = 0;
...
for(int k = 0; k < f - 1; k++){
for(int w = 0; w < f - 1; w++) {
reds += avgArr[k][w].getR(); // <-- note the +=
greens += avgArr[k][w].getG();
blues += avgArr[k][w].getB();
}
}
int count = (f-1)*(f-1);
Color avgColor = new Color(reds/count, greens/count, blues/count);
I have run into a small problem with my program as it seems unable to find the highest value in a histogram to calculate the scale the histogram is supposed to be so now the entire histogram is way out of bounds
I really hope someone can help me out since it's driving me crazy
import ij.*;
import ij.process.*;
import ij.gui.*;
import java.awt.*;
import ij.plugin.filter.*;
public class Oblig3_Oppg2 implements PlugInFilter {
public int setup(String arg, ImagePlus im) {;
return DOES_8G + NO_CHANGES;
}
public void run(ImageProcessor ip) {
final int W = 256;
final int H = 100;
final int H1 = 140;
int[] hist = ip.getHistogram();
int[] KH = new int[W]; //Cumulative Histogram Array
int maxVal;
//Calculates the highest pixel count in the Histogram
for (int i = 0; i < W; i++){
if (hist[i] > maxVal){
maxVal = i;
}
}
KH[0] = hist[0];
for(int i = 1; i < W; i++) {
KH[i] = KH[i-1] + hist[i];
}
ImageProcessor histIp = new ByteProcessor(W, H1);
histIp.setValue(255);
histIp.fill();
int max = KH[255];
for(int j = 0; j < W; j++){
KH[j] = (KH[j]*100)/max; //Scales the Cumulative Histogram
hist[j] = (hist[j]*100)/maxVal; // Scales the Histogram
}
for (int k = 0; k < W; k++){
histIp.setValue(0);
histIp.drawLine(k, H, k, H-KH[k]);
}
for (int k = 0; k < W; k++){
histIp.setValue(0);
histIp.drawLine(k, H, k, H-hist[k]);
}
for (int l = 0; l < W; l++){
histIp.setValue(l);
histIp.drawLine(l, 140, l, 102);
}
histIp.setValue(0);
histIp.drawLine(W, H, W, 0);
// Display the histogram image:
String hTitle = "Histogram";
ImagePlus histIm = new ImagePlus(hTitle, histIp);
histIm.show();
}
}
You should set maxVal to the actual value, not the current index in your loop:
for (int i = 0; i < W; i++){
if (hist[i] > maxVal){
maxVal = hist[i]; // <-- here
}
}
Furthermore, it might be better to limit the loop to hist.length instead of W. That would prevent errors in case you set W to some value different from the array length that ip.getHistogram() returns.
Since you don't provide a runnable example (i.e. the entire Java class; I assume you implement ij.plugin.filter.PlugInFilter), I didn't test the code, and it's not entirely clear to me what you want to achieve.
I've been trying to convert some opencv C++ code in opencv java and I can't seem to get pixel division to work properly. I take a meanshiftsegmented mat that I convert to grayscale then to 32F.
I then compare the most downsampled then upsampled image (which is comprised of the gray meanshift mat) to the original gray meanshift mat.
I've already read Using get() and put() to access pixel values in OpenCV for Java
however, it and others like it do not work. The error message I am getting is invalid mat type 5. However, even if I were able to see the saliency map I am positive it is wrong. This is because when I pass in image 001.jpg in c++ I am supposed to see the original image + red square around the objects. In java, I am only seeing the original image at the end.
NOTE :
AbstractImageProvider.deepCopy(AbstractImageProvider.matToBufferedImage(Saliency),disp);
Is an API call that works when I attempt to show the original mat, meanShift mat, and the gray meanShift mat. It fails at showing saliency.
c++
I only did a channel split because I was testing out other colorspaces, however in java I only want to use grayscale.
input = MeanShift.clone();
input.convertTo(input, CV_32F);
for(int i = 0; i < Pyramid_Size; i++){DS_Pyramid[i] = input.clone();}
for (int i = 0; i < Pyramid_Size; i++){
for (int k = 0; k <= i; k++){ // Why don't I just downsamplex3 a copy of MeanShift.clone then upsamplex3 that same one? ...
pyrDown (DS_Pyramid[i], DS_Pyramid[i], Size(DS_Pyramid[i].cols/2, DS_Pyramid[i].rows/2));
US_Pyramid[i] = DS_Pyramid[i].clone();
}
for (int j = 0; j <= i; j++){
pyrUp (US_Pyramid[i], US_Pyramid[i], Size(US_Pyramid[i].cols*2, US_Pyramid[i].rows*2));
}
}
top = US_Pyramid[Pyramid_Size - 1].clone(); // most down sampled layer, up sampled.
split(top, top_chs);
split(input.clone(), meanShift_chs); // split into channels result
split(input.clone(), sal_chs); // holder to use for compare
float top_min = 1.0;
float ms_min = 1.0;
for (int i = 0; i < top.rows; i++){ // find the smallest value in both top and meanShift
for (int k = 0; k < top.cols; k++){ // this is so you can sub out the 0 with the min value
for (int j = 0; j < top.channels(); j++){ // later on
float a = top_chs[j].at<float>(i,k);
float b = meanShift_chs[j].at<float>(i,k);
if (a < top_min && a >= 0) {top_min = a;} // make sure you don't have a top_min of zero... that'd be bad.
if (b < ms_min && b >= 0) { ms_min = b;}
}
}
}
for (int i = 0; i < top.rows; i++){
for (int k = 0; k < top.cols; k++){
for (int j = 0; j < top.channels(); j++){
float a,b,c;
a = top_chs[j].at<float>(i,k);
b = meanShift_chs[j].at<float>(i,k);
if (a <= 0){a = top_min;} // make sure you don't divide by zero
if (b <= 0){b = ms_min;} // make sure you really don't divide by zero
if (a <= b){c = 1.0 - a/b;}
else {c = 1.0 - b/a;}
// c = sqrt(c); // makes stuff more salient, but makes noise pop out too
sal_chs[j].at<float>(i,k) = c;
}
}
}
merge(sal_chs, Saliency); // combine into saliency map
imshow("saliency", Saliency);
java
MeanShift = inputImage.clone();
Imgproc.pyrMeanShiftFiltering(MeanShift, MeanShift, MeanShift_spatialRad, MeanShift_colorRad);
Imgproc.cvtColor(MeanShift, MeanShift, Imgproc.COLOR_BGR2GRAY);
MeanShift.convertTo(MeanShift, CvType.CV_32F); // 32F between 0 - 1. ************** IMPORTANT LINE
for (int i = 0; i < PyrSize; i++){
DS_Pyramid.add(new Mat());
UP_Pyramid.add(new Mat());
}
for (int i = 0; i < PyrSize; i++){
DS_Pyramid.set(i, MeanShift);
}
for (int i = 0; i < PyrSize; i++){
for(int k = 0; k <= i; k++){ // At 0 is downsampled once, second twice, third 3 times.
Imgproc.pyrDown(DS_Pyramid.get(i), DS_Pyramid.get(i)); // pyrDown by default img.width / 2 img height / 2
Mat a = new Mat(); // save the sampled down at i
a = DS_Pyramid.get(i);
UP_Pyramid.add(a);
}
for (int j = 0; j <= i; j++){
Imgproc.pyrUp(UP_Pyramid.get(i),UP_Pyramid.get(i));
}
}
top = UP_Pyramid.get(PyrSize-1);
bot = MeanShift.clone();
Saliency = MeanShift.clone();
//http://answers.opencv.org/question/5/how-to-get-and-modify-the-pixel-of-mat-in-java/
//http://www.tutorialspoint.com/java_dip/applying_weighted_average_filter.htm
for (int i = 0; i < top.rows(); i++){
for (int j = 0; j < top.cols(); j++){
int index = i * top.rows() + j;
float[] top_temp = top.get(i, j);
float[] bot_temp = bot.get(i,j);
float[] sal_temp = bot.get(i,j);
if (top_temp[0] <= bot_temp[k]){sal_temp[0] = 1.0f - (top_temp[0]/bot_temp[0]);}
else {sal_temp[0] = 1.0f - (bot_temp[0]/top_temp[0]);}
Saliency.put(i,j, sal_temp);
}
}
AbstractImageProvider.deepCopy(AbstractImageProvider.matToBufferedImage(Saliency),disp);
Found a simple and working solution after a lot of searching. This might help you get past the error- invalid mat type 5
Code:
Mat img = Highgui.imread("Input.jpg"); //Reads image from the file system and puts into matrix
int rows = img.rows(); //Calculates number of rows
int cols = img.cols(); //Calculates number of columns
int ch = img.channels(); //Calculates number of channels (Grayscale: 1, RGB: 3, etc.)
for (int i=0; i<rows; i++)
{
for (int j=0; j<cols; j++)
{
double[] data = img.get(i, j); //Stores element in an array
for (int k = 0; k < ch; k++) //Runs for the available number of channels
{
data[k] = data[k] * 2; //Pixel modification done here
}
img.put(i, j, data); //Puts element back into matrix
}
}
Highgui.imwrite("Output.jpg", img); //Writes image back to the file system using values of the modified matrix
Note: An important point that has not been mentioned anywhere online is that the method put does not write pixels onto Input.jpg. It merely updates the values of the matrix img. Therefore, the above code does not alter anything in the input image. For producing a visible output, the matrix img needs to be written onto a file i.e., Output.jpg in this case. Also, using img.get(i, j) seems to be a better way of handling the matrix elements rather than using the accepted solution above as this helps in visualizing and working with the image matrix in a better way and does not require a large contiguous memory allocation.
I'm not good with java conventions and best practices.
I need two-dimensional buffer for some big calculation involving dynamic programming and doubt if I should use one-dimensional array and map two coordinates to single, or use array of arrays and chained access by indexes.
In C I would prefer the first way, but Java is not a C and may have extra specifics that matter.
If you need top speed, definitely use a single array (one-dimensional) and map your indices as appropriate. As I see in the thread linked to in a comment below your question, people seem to disregard the ill effects of 2d-arrays on CPU cache lines and emphasize only the number of memory lookups.
There is one consideration to take: if your inner arrays are large enough (1K or more, say), then the speed advantage starts fading away. If the inner arrays is smallish (like 10-50), then the difference should be noticeable.
EDIT
As rightfully demanded, here's my jmh benchmark:
#OutputTimeUnit(TimeUnit.SECONDS)
public class ArrayAccess
{
static final int gapRowsize = 128, rowsize = 32, colsize = 10_000;
static final int[][] twod = new int[colsize][],
gap1 = new int[colsize][];
static final int[] oned = new int[colsize*rowsize];
static final Random r = new Random();
static {
for (int i = 0; i < colsize; i++) {
twod[i] = new int[rowsize];
gap1[i] = new int[gapRowsize];
}
for (int i = 0; i < rowsize*colsize; i++) oned[i] = r.nextInt();
for (int i = 0; i < colsize; i++)
for (int j = 0; j < rowsize; j++)
twod[i][j] = r.nextInt();
}
#GenerateMicroBenchmark
public int oned() {
int sum = 0;
for (int i = 0; i < rowsize*colsize; i++)
sum += oned[i];
return sum;
}
#GenerateMicroBenchmark
public int onedIndexed() {
int sum = 0;
for (int i = 0; i < colsize; i++)
for (int j = 0; j < rowsize; j++)
sum += oned[ind(i,j)];
return sum;
}
static int ind(int row, int col) { return rowsize*row+col; }
#GenerateMicroBenchmark
public int twod() {
int sum = 0;
for (int i = 0; i < colsize; i++)
for (int j = 0; j < rowsize; j++)
sum += twod[i][j];
return sum;
}
}
Note the gap array allocation: this simulates the worst-case scenario with fragmented heap.
I see more than 5-fold advantage at rowsize = 32 and a still quite noticeable (25%) advantage at 1024. I also find the advantage to highly depend on the gap size, with the shown 128 being the worst case for rowsize = 32 (both higher and lower values diminish the advantage), and 512 the worst case for rowsize = 1024.
rowsize = 32, gapRowsize = 128
Benchmark Mean Units
oned 8857.400 ops/sec
twod 1697.694 ops/sec
rowsize = 1024, gapRowsize = 512
Benchmark Mean Units
oned 147.192 ops/sec
twod 118.275 ops/sec
As far as I understand (from answers such as this), java has no native multi-dimensional continuous memory arrays (unlike C#, for example).
While the jagged array syntax (arrays of arrays) might be good for most applications, I would still like to know what's the best practice if you do want the raw efficiency of a continuous-memory array (avoiding unneeded memory reads)
I could of course use a single-dimensional array that maps to a 2D one, but I prefer something more structured.
it's not difficult to do it manually:
int[] matrix = new int[ROWS * COLS];
int x_i_j = matrix[ i*COLS + j ];
now, is it really faster than java's multi dimension array?
int x_i_j = matrix[i][j];
for random access, maybe. for continuous access, probably not - matrix[i] is almost certainly in L1 cache, if not in register cache. in best scenario, matrix[i][j] requires one addition and one memory read; while matrix[i*COLS + j] may cost 2 additions, one multiply, one memory read. but who's counting?
It depends on your access pattern. Using this simple program, comparing an int[][] with a 2D mapped over a 1D int[] array treated as a matrix, a native Java 2D matrix is:
25% faster when the row is on the cache, ie: accessing by rows:
100% slower when the row is not in the cache, ie: accessing by colums:
ie:
// Case #1
for (y = 0; y < h; y++)
for (x = 0; x < w; x++)
// Access item[y][x]
// Case #2
for (x = 0; x < w; x++)
for (y = 0; y < h; y++)
// Access item[y][x]
The 1D matrix is calculated as:
public int get(int x, int y) {
return this.m[y * width + x];
}
Let's say you have a 2D array int[][] a = new int[height][width], so by convention you have the indices a[y][x]. Depending on how you represent the data and how you access them, the performance varies in a factor of 20 :
The code:
public class ObjectArrayPerformance {
public int width;
public int height;
public int m[];
public ObjectArrayPerformance(int w, int h) {
this.width = w;
this.height = h;
this.m = new int[w * h];
}
public int get(int x, int y) {
return this.m[y * width + x];
}
public void set(int x, int y, int value) {
this.m[y * width + x] = value;
}
public static void main (String[] args) {
int w = 1000, h = 2000, passes = 400;
int matrix[][] = new int[h][];
for (int i = 0; i < h; ++i) {
matrix[i] = new int[w];
}
long start;
long duration;
System.out.println("duration[ms]\tmethod");
start = System.currentTimeMillis();
for (int z = 0; z < passes; z++) {
for (int y = 0; y < h; y++) {
for (int x = 0; x < w; x++) {
matrix[y][x] = matrix[y][x] + 1;
}
}
}
duration = System.currentTimeMillis() - start;
System.out.println(duration+"\t2D array, loop on x then y");
start = System.currentTimeMillis();
for (int z = 0; z < passes; z++) {
for (int x = 0; x < w; x++) {
for (int y = 0; y < h; y++) {
matrix[y][x] = matrix[y][x] + 1;
}
}
}
duration = System.currentTimeMillis() - start;
System.out.println(duration+"\t2D array, loop on y then x");
//
ObjectArrayPerformance mt = new ObjectArrayPerformance(w, h);
start = System.currentTimeMillis();
for (int z = 0; z < passes; z++) {
for (int x = 0; x < w; x++) {
for (int y = 0; y < h; y++) {
mt.set(x, y, mt.get(x, y) + 1);
}
}
}
duration = System.currentTimeMillis() - start;
System.out.println(duration+"\tmapped 1D array, access trough getter/setter");
//
ObjectArrayPerformance mt2 = new ObjectArrayPerformance(w, h);
start = System.currentTimeMillis();
for (int z = 0; z < passes; z++) {
for (int x = 0; x < w; x++) {
for (int y = 0; y < h; y++) {
mt2.m[y * w + x] = mt2.m[y * w + x] + 1;
}
}
}
duration = System.currentTimeMillis() - start;
System.out.println(duration+"\tmapped 1D array, access through computed indexes, loop y then x");
ObjectArrayPerformance mt3 = new ObjectArrayPerformance(w, h);
start = System.currentTimeMillis();
for (int z = 0; z < passes; z++) {
for (int y = 0; y < h; y++) {
for (int x = 0; x < w; x++) {
mt3.m[y * w + x] = mt3.m[y * w + x] + 1;
}
}
}
duration = System.currentTimeMillis() - start;
System.out.println(duration+"\tmapped 1D array, access through computed indexes, loop x then y");
ObjectArrayPerformance mt4 = new ObjectArrayPerformance(w, h);
start = System.currentTimeMillis();
for (int z = 0; z < passes; z++) {
for (int y = 0; y < h; y++) {
int yIndex = y * w;
for (int x = 0; x < w; x++) {
mt4.m[yIndex + x] = mt4.m[yIndex + x] + 1;
}
}
}
duration = System.currentTimeMillis() - start;
System.out.println(duration+"\tmapped 1D array, access through computed indexes, loop x then y, yIndex optimized");
}
}
We can conclude that linear access performance depends more on the way you process the array (lines then columns or the reverse?: performance gain = x10, much due to CPU caches) than the structure of the array itself (1D vs 2D : performance gain = x2).
If random access, the performance differences should be much lower, because the CPU caches have less effect.
If you really want more structure with a continuous-memory array, wrap it in an object.
public class My2dArray<T> {
int sizeX;
private T[] items;
public My2dArray(int x, int y) {
sizeX = x;
items = new T[x*y];
}
public T elementAt(int x, int y) {
return items[x+y*sizeX];
}
}
Not a perfect solution, and you probably already know it. So consider this confirmation of what you suspected to be true.
Java only provides certain constructs for organizing code, so eventually you'll have to reach for a class or interface. Since this also requires specific operations, you need a class.
The performance impacts include creating a JVM stack frame for each array access, and it would be ideal to avoid such a thing; however, a JVM stack frame is how the JVM implements it's scoping. Code organization requires appropriate scoping, so there's not really a way around that performance hit that I can imagine (without violating the spirit of "everything is an object").
Sample implementation, without a compiler. This is basically what C/C++ do behind the scenes when you access multidimensional arrays. You'll have to further define accessor behaviour when less than the actual dimensions are specified & so on. Overhead will be minimal and could be optimized further, but thats microoptimizing imho. Also, you never actually know what goes on under the hood after JIT kicks in.
class MultiDimentionalArray<T> {
//disclaimer: written within SO editor, might contain errors
private T[] data;
private int[] dimensions; //holds each dimensions' size
public MultiDimensionalArray(int... dims) {
dimensions = Arrays.copyOf(dims, dims.length);
int size = 1;
for(int dim : dims)
size *= dim;
data = new T[size];
}
public T access(int... dims) {
int idx = 1;
for(int i = 0; i < dims.length)
idx += dims[i] * dimensions[i]; //size * offset
return data[idx];
}
}
The most efficient method of implementing multi-dimensional arrays is by utilizing one-dimensional arrays as multi-dimensional arrays. See this answer about mapping a 2D array into a 1D array.
// 2D data structure as 1D array
int[] array = new int[width * height];
// access the array
array[x + y * width] = /*value*/;
I could of course use a single-dimensional array that maps to a 2D one, but I prefer something more structured.
If you want to access array in a more structured manner, create a class for it:
public class ArrayInt {
private final int[] array;
private final int width, height;
public ArrayInt(int width, int height) {
array = new int[width * height];
this.width = width;
this.height = height;
}
public int getWidth() {
return width;
}
public int getHeight() {
return height;
}
public int get(int x, int y) {
return array[x + y * width];
}
public void set(int x, int y, int value) {
array[x + y * width] = value;
}
}
If you wanted arrays of objects, you could use generics and define class Array<T>, where T is the object stored in the array.
Performance-wise, this will, in most cases, be faster than a multi-dimensional array in Java. The reasons can be found in the answers to this question.
If you cannot live without C constructs, there's always JNI.
Or you could develop your own Java-derived language (and VM and optimizing JIT compiler) that has a syntax for multidimensional continuous-memory arrays.