OpenCV performance on template matching

OpenCV performance on template matching - java

I'm trying to do template matching basically on java. I used straightforward algorithm to find match. Here is the code:
minSAD = VALUE_MAX;
// loop through the search image
for ( int x = 0; x <= S_rows - T_rows; x++ ) {
for ( int y = 0; y <= S_cols - T_cols; y++ ) {
SAD = 0.0;
// loop through the template image
for ( int i = 0; i < T_rows; i++ )
for ( int j = 0; j < T_cols; j++ ) {
pixel p_SearchIMG = S[x+i][y+j];
pixel p_TemplateIMG = T[i][j];
SAD += abs( p_SearchIMG.Grey - p_TemplateIMG.Grey );
}
}
// save the best found position
if ( minSAD > SAD ) {
minSAD = SAD;
// give me VALUE_MAX
position.bestRow = x;
position.bestCol = y;
position.bestSAD = SAD;
}
}
But this is very slow approach. I tested 2 images (768 × 1280) and subimage (384 x 640). This lasts for ages.
Does openCV perform template matching much faster or not with ready function cvMatchTemplate()?

You will find openCV cvMatchTemplate() is much mush quicker than the method you have implemented. What you have created is a statistical template matching method. It is the most common and the easiest to implement however is extremely slow on large images. Lets take a look at the basic maths you have a image that is 768x1280 you loop through each of these pixels minus the edge as this is you template limits so (768 - 384) x (1280 - 640) that 384 x 640 = 245'760 operations in which you loop through each pixel of your template (another 245'760 operations) therefore before you add any maths in your loop you already have (245'760 x 245'760) 60'397'977'600 operations. Over 60 billion operations just to loop through your image It's more surprising how quick machines can do this.
Remember however its 245'760 x (245'760 x Maths Operations) so there are many more operations.
Now cvMatchTemplate() actually uses the Fourier Analysis Template matching operation. This works by applying a Fast Fourier Transform (FFT) on the image in which the signals that make up the pixel changes in intensity are segmented into each of the corresponding wave forms. The method is hard to explain well but the image is transformed into a signal representation of complex numbers. If you wish to understand more please search on goggle for the fast fourier transform. Now the same operation is performed on the template the signals that form the template are used to filter out any other signals from your image.
In simple it suppresses all features within the image that do not have the same features as your template. The image is then converted back using a inverse fast fourier transform to produce an images where high values mean a match and low values mean the opposite. This image is often normalised so 1's represent a match and 0's or there about mean the object is no where near.
Be warned though if they object is not in the image and it is normalised false detection will occur as the highest value calculated will be treated as a match. I could go on for ages about how the method works and its benefits or problems that can occur but...
The reason this method is so fast is: 1) opencv is highly optimised c++ code. 2) The fft function is easy for your processor to handle as a majority have the ability to perform this operation in hardware. GPU graphic cards are designed to perform millions of fft operations every second as these calculations are just as important in high performance gaming graphics or video encoding. 3) The amount of operations required is far less.
In summery statistical template matching method is slow and takes ages whereas opencv FFT or cvMatchTemplate() is quick and highly optimised.
Statistical template matching will not produce errors if an object is not there whereas opencv FFT can unless care is taken in its application.
I hope this gives you a basic understanding and answers your question.
Cheers
Chris
[EDIT]
To further answer your Questions:
Hi,
cvMatchTemplate can work with CCOEFF_NORMED and CCORR_NORMED and SQDIFF_NORMED including the non-normalised version of these. Here shows the kind of results you can expect and gives your the code to play with.
http://dasl.mem.drexel.edu/~noahKuntz/openCVTut6.html#Step%202
The three methods are well cited and many papers are available through Google scholar. I have provided a few papers bellow. Each one simply uses a different equation to find the correlation between the FFT signals that form the template and the FFT signals that are present within the image the Correlation Coefficient tends to yield better results in my experience and is easier to find references to. Sum of the Squared Difference is another method that can be used with comparable results. I hope some of these help:
Fast normalized cross correlation for defect detection
Du-Ming Tsai; Chien-Ta Lin;
Pattern Recognition Letters
Volume 24, Issue 15, November 2003, Pages 2625-2631
Template Matching using Fast Normalised Cross Correlation
Kai Briechle; Uwe D. Hanebeck;
Relative performance of two-dimensional speckle-tracking techniques: normalized correlation, non-normalized correlation and sum-absolute-difference
Friemel, B.H.; Bohs, L.N.; Trahey, G.E.;
Ultrasonics Symposium, 1995. Proceedings., 1995 IEEE
A Class of Algorithms for Fast Digital Image Registration
Barnea, Daniel I.; Silverman, Harvey F.;
Computers, IEEE Transactions on Feb. 1972
It is often favoured to use the normalised version of these methods as anything that equals a 1 is a match however if not object is present you can get false positives. The method works fast simply due to the way it is instigated in the computer language. The operations involved are ideal for the processor architecture which means it can complete each operation with a few clock cycles rather than shifting memory and information around over several clock cycles. Processors have been solving FFT problems for many years know and like I said there is inbuilt hardware to do so. Hardware based is always faster than software and statistical method of template matching is in basic software based. Good reading for the hardware can be found here:
Digital signal processor
Although a Wiki page the references are worth a look an effectively this is the hardware that performs FFT calculations
A new Approach to Pipeline FFT Processor
Shousheng He; Mats Torkelson;
A favourite of mine as it shows whats happening inside the processor
An Efficient Locally Pipelined FFT Processor
Liang Yang; Kewei Zhang; Hongxia Liu; Jin Huang; Shitan Huang;
These papers really show how complex the FFT is when implemented however the pipe-lining of the process is what allows the operation to be performed in a few clock cycles. This is the reason real time vision based systems utilise FPGA (specifically design processors that you can design to implement a set task) as they can be design extremely parallel in the architecture and pipe-lining is easier to implement.
Although I must mention that for FFT of an image you are actually using FFT2 which is the FFT of the horizontal plain and the FFT of the vertical plain just so there is no confusion when you find reference to it. I can not say I have an expert knowledge in how the equations implemented and the FFT is implemented I have tried to find good guides yet finding a good guide is very hard so much I haven't yet found one (Not one I can understand at least). One day I may understand them but for know I have a good understanding of how they work and the kind of results that can be expected.
Other than this I can't really help you more if you want to implement your own version or understand how it works it's time to hit the library but I warn you the opencv code is so well optimised you will struggle to increase its performance however who knows you may figure out a way to gain better results all the best and Good luck
Chris

Related

Step detection in real-time 1D data

For a small project we're trying to implement an autopilot for a slot car. A gyro sensor is attached to the car and delivers the Z-value (meaning the amount of centrifugal force acting on the car/sensor) 20 times per second. One crucial part of this is the detection of whether or not the car is in a curve or on a straight part and when exactly it was entered and when it left that part. Only so we can have reliable prediction of what'll happen in the future.
As for now, we're working with a sliding window to smooth the data and then have hardcoded limits (-400 for a left curve and +400 for a right curve) to detect what kind of sector (left, right, straight) we're in.
Obviously this takes too long, as it takes a couple of messages until the program detects that it's a direction change because of the smoothing and the hardcoded limits.
Here's an example of two rounds on a simple track, starting at the checkered area:
A perfect algorithm would detect the sectors S R S R S L S R S R S R S for one round, with a delay of only a couple of data points.
We thought about using the first derivative of the gyro values, but in the sample graph right after the first left curve, the following right curve (between 22:36:40 and 22:36:42) shows signs of swerving. Here the first derivative would be close to 0 and indicate a straight part...
Also, there we'd need to set a hardcoded threshold again, but with the noise of the data it could be that a small bump in the track could result in such a noise level that it's derivative would exceed the threshold.
Now we're not sure about what would be the easiest/fastest/most reliable way to handle this sort of detection. Would using a derivative be a good idea? Is there a better way?
Any input would be greatly appreciated :)
The existing software is written in Java.

In such problems, you have to trade robustness for immediacy. If you don't know what happens in the future, you can only make assumptions. And these assumptions may hold or may not.
From the looks of your data, there shouldn't be any smoothing necessary. If you define a reasonable threshold, the curves should be recognized quite reliably. If, however, this is not the case, here are some things you could try:
You already mentioned smoothing. The crucial point is how you smooth. An asymmetric smoothing kernel is probably desirable (a half triangle filter can be updated in constant time). You can directly weigh robustness and immediacy by modifying the kernel width.
A simple alternative to filtering is counting. If your data is above the curve threshold, don't call it a curve just yet. Count how many data points are above the threshold in a row. If there are more than n data points above the threshold, then you're most likely in a curve.
Using derivatives is potentially problematic. The main reason against derivatives is that a curve is not defined by any derivative at all (at least no derivative of the force). The second problem is that you can only estimate the derivatives numerically, which is quite unstable with lots of noise. So you would have to smooth your data (or find a numerical scheme for your noise model), which again requires some latency.

Optimising conways game of life

I'm busy coding Conways Game of Life and I'm trying to optimise it using some data structure that records which cells should be checked on each life cycle.
I'm using an arrayList as a dynamic data structure to keep a record of all living cells and their neighbours. Is there a better data structure or way of keeping a shorter list that will increase the games speed?
I ask this because often many cells are checked but not changed so I feel like my implementation could be improved.

I believe that the Hashlife algorithm could help you.
It gives the idea of using a quadtree (tree data structure in which each internal node has exactly four children) to keep the data, and then it uses hash tables to store the nodes of the quadtree.
For further reading, this post, written by Eric Burnett, gives a great insight about how Hashlife works, it's performance and implementation (although in Python). It's worth a read.

I built a Life engine that operated on 256x512 bit grids directly mapped to screen pixels back in the 1970s, using a 2Mhz 6800 8 bit computer. I did it directly on the display pixels (they were one-bit on/off white/black) because I wanted to see the results and didn't see the point in copying the Life image to the display.
Its fundamental trick was to treat the problem as one of evaluating a Boolean logic formula for "this cell is on" based on rules of Life, rather than counting live neighbors as is usual. This formula is pretty easy to figure out, so left as a homework exercise. What made it fast was that the Boolean formula was computed on a per-bit basis, doing 8 bits at a time. If you sweep down the screen and across rows, you can in essence evaluate N bits at once (8 on the 6800, 64 on modern PCs) with very low overhead. If you go nuts, you can likely use the SIMD vector extensions and do 256 bits or more at "once". Over the top would be doing this with a GPU.
The 6800 version would process a complete screen in about .5 second; you could watch the update ripple down the screen from top to bottom (60 Hz refresh). On a modern CPU with 1000x the clock rate (1 GHz is pretty easy to get) and 64 bits at a time, it should be able to produce thousands of frames per second. So fast you can't watch it run :-{
A useful observation is that much of the Life world is dead (blank) and processing that part mostly produces more dead cells. This suggests using a sparse representation. Another poster suggested quadtrees, which I think is a very good suggestion. Your quadtree regions don't have to be square, either.
Combining the two ideas, quadtrees for non-blank regions with bit-level processing for blocks of bits designated by the quadtrees is likely to give an astonishingly fast Life algorithm.

How to implement an Equalizer

I know there are a lot of questions about equalizers in so, but I didn't get what I was looking. What I want to do is an equalizer for modifying audio samples in such a way like:
equalizer.eqAudio(audiosamples, band, gain)
I'm not sure if that is the exact interface that I want because I know little about DSP in terms of implementing them (I used filters, limiters, compressors but not made them).
So Googling about this I read that I must do a FFT to the samples so I get the data per frequency ranges instead of amplitude, process it the way I want and then make the inverse of the FFT so I get the result in audio samples again. I looked for an implementation of this FFT and found JTransform for Java. This library has an implementation of a FFT related algorithm called Discrete Cosine Transform (DCT).
My questions are :
Well, Am I in the right way?
Since FFT gives me data about frequency, I should pass to the FFT algorithm a chunk of samples. How big this chunk must be?
Is there a good book about DSP programming that explains equalizers ?
Thanks!

FFT wouldn't be my first choice for audio equalisation. I would default to building an EQ with IIR or FIR filters. FFT might be useful for special circumstances.
A commonly recommended reference is the Cookbook Formulae for Audio EQ Biquad Filter Coefficients.
A java tutorial for programming biquad filters. http://arachnoid.com/BiQuadDesigner/index.html
Is there a good book about DSP programming that explains equalizers ?
Understanding Digital Signal Processing is a good introduction to DSP. There are chapters on FIR and IIR filters.
Intoduction To Digital Filters with Audio Applications by Julius O. Smith III.
Graphic Equalizer Design Using Higher-Order Recursive Filters by Martin Holters and Udo Zolzer is a short paper detailing one EQ filter design approach.

There are many different ways to obtain an equalizer, and as Shannon explains, the IIR/FIR filter way is one of them. However, if your goal is to quickly get an equalizer up and running, going the FFT way might be easier for you, as there exist a wealth of reference implementations.
As to your question of FFT size, it depends on what frequency resolution you want your equalizer to have. If you choose a size of 16, you will get 9 (8 complex + 1 real) channels in the frequency domain equally spaced from 0 to fs/2. The 1st is centered around 0Hz, and the 9th around fs/2 Hz. And note, some implementations return 16 channels where the high part is a mirrored and complex conjugated version of the low part.
As to the implementation of the equalizer functionality, multiply each channel with the wanted gain. And if the spectrum have the mirrored part, mirror the gains as well. If this is not done, the result of the following IFFT will not be a real valued signal. After multiplication, apply the IFFT.
As to the difference between a FFT and filter based equalizer, remember that a FFT is simply a fast way of calculating a set of FIR filters with sines as impulse, critically sampled (downsampled with the filter length) and evenly spaced center frequencies.
Regards

frequency / pitch detection for dummies

While I have many questions on this site dealing with the concept of pitch detection... They all deal with this magical FFT with which I am not familiar. I am trying to build an Android application that needs to implement pitch detection. I have absolutely no understanding for the algorithms that are used to do this.
It can't be that hard can it? There are around 8 billion guitar tuner apps on the android market after all.
Can someone help?

The FFT is not really the best way to implement pitch detection or pitch tracking. One issue is that the loudest frequency is not always the fundamental frequency. Another is that the FFT, by itself, requires a pretty large amount of data and processing to obtain the resolution you need to tune an instrument, so it can appear slow to respond (i.e. latency). Yet another issue is that the result of an FFT is necessarily intuitive to work with: you get an array of complex numbers and you have to know how to interpret them.
If you really want to use an FFT, here is one approach:
Low-pass your signal. This will help prevent noise and higher harmonics from creating spurious results. Conceivably, you could do skip this step and instead weight your results towards the lower values of the FFT instead. For some instruments with strong fundamental frequencies, this might not be necessary.
Window your signal. Windows should be at lest 4096 in size. Larger is better to a point because it gives you better frequency resolution. If you go too large, it will end up increasing your computation time and latency. The hann function is a good choice for your window. http://en.wikipedia.org/wiki/Hann_function
FFT the windowed signal as often as you can. Even overlapping windows are good.
The results of the FFT are complex numbers. Find the magnitude of each complex number using sqrt( real^2 + imag^2 ). The index in the FFT array with the largest magnitude is the index with your peak frequency.
You may want to average multiple FFTs for more consistent results.
How do you calculate the frequency from the index? Well, let's say you've got a window of size N. After you FFT, you will have N complex numbers. If your peak is the nth one, and your sample rate is 44100, then your peak frequency will be near (44100/2)*n/N. Why near? well you have an error of (44100/2)*1/N. For a bin size of 4096, this is about 5.3 Hz -- easily audible at A440. You can improve on that by 1. taking phase into account (I've only described how to take magnitude into account), 2. using larger windows (which will increase latency and processing requirements as the FFT is an N Log N algorithm), or 3. use a better algorithm like YIN http://www.ircam.fr/pcm/cheveign/pss/2002_JASA_YIN.pdf
You can skip the windowing step and just break the audio into discrete chunks of however many samples you want to analyze. This is equivalent to using a square window, which works, but you may get more noise in your results.
BTW: Many of those tuner apps license code form third parties, such as z-plane, and iZotope.
Update: If you want C source code and a full tutorial for the FFT method, I've written one. The code compiles and runs on Mac OS X, and should be convertible to other platforms pretty easily. It's not designed to be the best, but it is designed to be easy to understand.

A Fast Fourier Transform changes a function from time domain to frequency domain. So instead of f(t) where f is the signal that you are getting from the microphone and t is the time index of that signal, you get g(θ) where g is the FFT of f and θ is the frequency. Once you have g(θ), you just need to find which θ with the highest amplitude, meaning the "loudest" frequency. That will be the primary pitch of the sound that you are picking up.
As for actually implementing the FFT, if you google "fast fourier transform sample code", you'll get a bunch of examples.

Real time object detection (sift)?

I am investigatin this field to obtain object detection in real time.
Video example:
http://www.youtube.com/watch?v=Bm5qUG-06V8
http://www.youtube.com/watch?v=aYd2kAN0Y20
But how can they extract sift keypoint and matching them so fast?
SIFT extraction requires a second generally

I'm an OpenIMAJ developer and responsible for making the first video.
We're not doing anything particularly fancy to make the matching fast in that video, and the SIFT detection and extraction is carried out on the entirety of every frame. In fact that video was made well before we did any optimisation; the current version of that demo is much smoother. We do also have a version with a hybrid KLT-tracker that works even faster by not having to perform SIFT on every frame.
As suggested by #Mario, the image size does have a big effect on the speed of the extraction, so processing a smaller frame can give a big win. Secondly, in the original description of the difference of Gaussian interest point localisation suggested by Lowe in the SIFT paper, it was suggested that the input image was first doubled in size to increase the number of features. By not performing this double-sizing you also get a big performance boost at the expense of having fewer features to match.
The code is open source (BSD license) and you can get it by following the links at http://www.openimaj.org. As stated in the video description, the image-processing code is pure Java; the only native code is a thin interface to the webcam. Tutorial number 7 in the current tutorial pdf document walks through the process of using SIFT in OpenIMAJ. Disabling the double-sizing can be achieved by doing:
DoGSIFTEngine engine = new DoGSIFTEngine();
engine.getOptions().setDoubleInitialImage(false);

SIFT can be accelerated in several ways :
if you can afford approximations, then you can derive a keypoint called SURF which is way faster (using integral images for most tasks)
you can use parallel implementations, at the CPU level (e.g. OpenCV uses Intel's TBB) or at the GPU level (google for sift gpu for related code and doc).
Anyway, none of these is available (AFAIK) in Java, so you'll have to use a Java wrapper to opencv or work it out yourself.

General and first idea: Ask the video uploader(s). We can just assume what's done or how it's done. It might also help to know what you've done so far (e.g. your video resolution, your processing power, image preparation, etc.).
I haven't used SIFT specifically, but I did quite some object/motion tracking during the last few years, so this is more in general. You might have tried some points already, I don't know.
Reduce your image resolution: Going from 640x480 to 320x240 will reduce your data to 25%. Going down to 160x120 will cut it by another 25% (so 6.25 % data left) without significantly impacting your algorithm.
In a similar way, it might be useful to reduce the color depth of your image (not just 256 grayscale, but maybe even more; like 64 colors).
Try other methods to make features more obvious or faster to find, e.g. try running an edge detector over your image.
At least the second video mentions a tracking system, so you could try to guess the region where the object tracked should reappear the next frame (using some simple a/b filter or whatever on coordinates and possibly rotation), then use SIFT on that sub area (with some added margin) only. Only analyze the whole image if you can't find it again. At around 40 or 50 seconds in the second video they're losing the object and need quite some time/tries to find it again.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.