Using jTransforms library on a WAV file? - java

I am trying to do spectral analysis on a WAV file using the jTransforms library: Official Site
But I have problems on how to convert the WAV file into an acceptable input for FFT using jTransforms, and how can I display a frequency spectrum after FFT? I have searched around Google and found I need to somehow convert the WAV file into a double[] or Complex[], and afterwards how should I interpret the output?
Sorry I am very new to FFT so this question may sound extremely stupid. Many thanks!

I don't know your library but i guess they have extensive documentation on how to apply the transforms.
Regarding the interpretation, if you use a complex transform you can interpret the real part as energy for the corresponding freuqncy bin and the imaginary as phase of the sinusoid.
The power spectral density (PSD) can be computed by
real(fftData)*conj(fftData)
which is equal to
abs(fftData^2)
(so multiply the real parts by their complex conjugate).
One thing you might have to consider is rescaling your fft output. Some algorithms scale the output proportional to the fftSize. So you will have to multiply the output by 1/fftSize.
And the last thing in case you are not aware of, you only have to take half of the fft output since the spectrum is symmetric.
The middle bin (fftSize/2) is usually the mirrored fundamental frequency and is equal to fftData[0]. This marks the Nyquist frequency which is the highest frequency you can analyze with the given fftSize.
So if you want to display frequencies upto 22kHz make sure your fftSize is at least 44k.
There are many pitfalls with FFT, so be sure you read up on some parts and understand what you are doing there. The mathematics itself are not that important if you just want to use it, so you might skip them.
EDIT: There is even more. Consider to weight your input data with a tapered window (gaussian, hamming, hanning...) to avoid nasty edge effects if you don't feed your whole wav file as input. Otherwise you will get artificial high frequencies into your fft output which are simply not present in the original.

Related

How to implement an Equalizer

I know there are a lot of questions about equalizers in so, but I didn't get what I was looking. What I want to do is an equalizer for modifying audio samples in such a way like:
equalizer.eqAudio(audiosamples, band, gain)
I'm not sure if that is the exact interface that I want because I know little about DSP in terms of implementing them (I used filters, limiters, compressors but not made them).
So Googling about this I read that I must do a FFT to the samples so I get the data per frequency ranges instead of amplitude, process it the way I want and then make the inverse of the FFT so I get the result in audio samples again. I looked for an implementation of this FFT and found JTransform for Java. This library has an implementation of a FFT related algorithm called Discrete Cosine Transform (DCT).
My questions are :
Well, Am I in the right way?
Since FFT gives me data about frequency, I should pass to the FFT algorithm a chunk of samples. How big this chunk must be?
Is there a good book about DSP programming that explains equalizers ?
Thanks!
FFT wouldn't be my first choice for audio equalisation. I would default to building an EQ with IIR or FIR filters. FFT might be useful for special circumstances.
A commonly recommended reference is the Cookbook Formulae for Audio EQ Biquad Filter Coefficients.
A java tutorial for programming biquad filters. http://arachnoid.com/BiQuadDesigner/index.html
Is there a good book about DSP programming that explains equalizers ?
Understanding Digital Signal Processing is a good introduction to DSP. There are chapters on FIR and IIR filters.
Intoduction To Digital Filters with Audio Applications by Julius O. Smith III.
Graphic Equalizer Design Using Higher-Order Recursive Filters by Martin Holters and Udo Zolzer is a short paper detailing one EQ filter design approach.
There are many different ways to obtain an equalizer, and as Shannon explains, the IIR/FIR filter way is one of them. However, if your goal is to quickly get an equalizer up and running, going the FFT way might be easier for you, as there exist a wealth of reference implementations.
As to your question of FFT size, it depends on what frequency resolution you want your equalizer to have. If you choose a size of 16, you will get 9 (8 complex + 1 real) channels in the frequency domain equally spaced from 0 to fs/2. The 1st is centered around 0Hz, and the 9th around fs/2 Hz. And note, some implementations return 16 channels where the high part is a mirrored and complex conjugated version of the low part.
As to the implementation of the equalizer functionality, multiply each channel with the wanted gain. And if the spectrum have the mirrored part, mirror the gains as well. If this is not done, the result of the following IFFT will not be a real valued signal. After multiplication, apply the IFFT.
As to the difference between a FFT and filter based equalizer, remember that a FFT is simply a fast way of calculating a set of FIR filters with sines as impulse, critically sampled (downsampled with the filter length) and evenly spaced center frequencies.
Regards

frequency / pitch detection for dummies

While I have many questions on this site dealing with the concept of pitch detection... They all deal with this magical FFT with which I am not familiar. I am trying to build an Android application that needs to implement pitch detection. I have absolutely no understanding for the algorithms that are used to do this.
It can't be that hard can it? There are around 8 billion guitar tuner apps on the android market after all.
Can someone help?
The FFT is not really the best way to implement pitch detection or pitch tracking. One issue is that the loudest frequency is not always the fundamental frequency. Another is that the FFT, by itself, requires a pretty large amount of data and processing to obtain the resolution you need to tune an instrument, so it can appear slow to respond (i.e. latency). Yet another issue is that the result of an FFT is necessarily intuitive to work with: you get an array of complex numbers and you have to know how to interpret them.
If you really want to use an FFT, here is one approach:
Low-pass your signal. This will help prevent noise and higher harmonics from creating spurious results. Conceivably, you could do skip this step and instead weight your results towards the lower values of the FFT instead. For some instruments with strong fundamental frequencies, this might not be necessary.
Window your signal. Windows should be at lest 4096 in size. Larger is better to a point because it gives you better frequency resolution. If you go too large, it will end up increasing your computation time and latency. The hann function is a good choice for your window. http://en.wikipedia.org/wiki/Hann_function
FFT the windowed signal as often as you can. Even overlapping windows are good.
The results of the FFT are complex numbers. Find the magnitude of each complex number using sqrt( real^2 + imag^2 ). The index in the FFT array with the largest magnitude is the index with your peak frequency.
You may want to average multiple FFTs for more consistent results.
How do you calculate the frequency from the index? Well, let's say you've got a window of size N. After you FFT, you will have N complex numbers. If your peak is the nth one, and your sample rate is 44100, then your peak frequency will be near (44100/2)*n/N. Why near? well you have an error of (44100/2)*1/N. For a bin size of 4096, this is about 5.3 Hz -- easily audible at A440. You can improve on that by 1. taking phase into account (I've only described how to take magnitude into account), 2. using larger windows (which will increase latency and processing requirements as the FFT is an N Log N algorithm), or 3. use a better algorithm like YIN http://www.ircam.fr/pcm/cheveign/pss/2002_JASA_YIN.pdf
You can skip the windowing step and just break the audio into discrete chunks of however many samples you want to analyze. This is equivalent to using a square window, which works, but you may get more noise in your results.
BTW: Many of those tuner apps license code form third parties, such as z-plane, and iZotope.
Update: If you want C source code and a full tutorial for the FFT method, I've written one. The code compiles and runs on Mac OS X, and should be convertible to other platforms pretty easily. It's not designed to be the best, but it is designed to be easy to understand.
A Fast Fourier Transform changes a function from time domain to frequency domain. So instead of f(t) where f is the signal that you are getting from the microphone and t is the time index of that signal, you get g(θ) where g is the FFT of f and θ is the frequency. Once you have g(θ), you just need to find which θ with the highest amplitude, meaning the "loudest" frequency. That will be the primary pitch of the sound that you are picking up.
As for actually implementing the FFT, if you google "fast fourier transform sample code", you'll get a bunch of examples.

Analyzing Sound in a WAV file

I am trying to analyze a movie file by splitting it up into camera shots and then trying to determine which shots are more important than others. One of the factors I am considering in a shot's importance is how loud the volume is during that part of the movie. To do this, I am analyzing the corresponding sound file. I'm having trouble determining how "loud" a shot is because I don't think I fully understand what the data in a WAV file represents.
I read the file into an audio buffer using a method similar to that described in this post.
Having already split the corresponding video file into shots, I am now trying to find which shots are louder than others in the WAV file. I am trying to do this by extracting each sample in the file like this:
double amplitude = (double)((audioData[i] & 0xff) | (audioData[i + 1] << 8));
Some of the other posts I have read seem to indicate that I need to apply a Fast Fourier Transform to this audio data to get the amplitude, which makes me wonder what the values I have extracted actually represent. Is what I'm doing correct? My sound file format is a 16-bit mono PCM with a sampling rate of 22,050 Hz. Should I be doing something with this 22,050 value when I am trying to analyze the volume of the file? Other posts suggest using Root Mean Square to evaluate loudness. Is this required, or just a more accurate way of doing it?
The more I look into this the more confused I get. If anyone could shed some light on my mistakes and misunderstandings, I would greatly appreciate it!
The FFT has nothing to do with volume and everything to do with frequencies. To find out how loud a scene is on average, simply average the sampled values. Depending on whether you get the data as signed or unsigned values in your language, you might have to apply an absolute function first so that negative amplitudes don't cancel out the positive ones, but that's pretty much it. If you don't get the results you were expecting that must have to do with the way you are extracting the individual values in line 20.
That said, there are a few refinements that might or might not affect your task. Perceived loudness, amplitude and acoustic power are in fact related in non-linear ways, but as long as you are only trying to get a rough estimate of how much is "going on" in the audio signal I doubt that this is relevant for you. And of course, humans hear different frequencies better or worse - for instance, bats emit ultrasound squeals that would be absolutely deafening to us, but luckily we can't hear them at all. But again, I doubt this is relevant to your task, since e.g. frequencies above 22kHz (or was is 44kHz? not sure which) are in fact not representable in simple WAV format.
I don't know the level of accuracy you want, but a simple RMS (and perhaps simple filtering of the signal) is all many similar applications would need.
RMS will be much better than Peak amplitude. Using peak amplitudes is like determining the brightness of an image based on the brightest pixel, rather than averaging.
If you want to filter the signal or weigh it to perceived loudness, then you would need the sample rate for that.
FFT should not be required unless you want to do complex frequency analysis as well. The ear responds differently to frequencies at different amplitudes - the ear does not respond to sounds at different frequencies and amplitudes linearly. In this case, you could use FFT to perform frequency analyses for another domain of accuracy.

Fast Fourier Transform (FFT) input and output to analyse the frequency of audio files in Java?

I have to use FFT to analyse the frequency of an audio file. But I don't know what the input and output is.
Do I have to use 1-dimension, 2-dimension or 3-dimension array if I want to draw the spectrum's audio file? And can someone suggest me library for FFT on J2ME?
#thongcaoloi,
The simple answer regarding the dimensionality of your input data is: you need 1D data. Now I'll explain what that means.
Because you want to analyze audio data, your input to the discrete Fourier transform (DFT or FFT), is a 1-dimensional sequence of real numbers, which represents the changing voltage of the audio signal over time, and your audio file is a digital representation of that changing voltage over time.
Your audio file was produced by sampling the voltage of a continuous audio signal at a fixed sampling rate (also known as the sampling frequency), typically 44.1 KHz for CD quality audio.
But your data file could have been sampled at a much lower frequency, so try to find out the sampling frequency of your data before you do an FFT on that data.
So now you have to extract the individual samples from your audio file. If your file is stereo, it will have two separate sample sequences, one for the right channel and one for the left channel. If the file is mono, it will have only one sample sequence.
If your file is stereo, or any other multi-channel audio format such as 5.1 or 7.1, you could FFT each channel separately, or you could combine any number of channels together using voltage addition. That's up to you, and depends on what you're trying to do with your FFT results.
The output of the DFT or FFT is a sequence of complex numbers. Each complex number is a pair consisting of a real-part and an imaginary-part, typically shown as a pair (re,im).
If you want to graph the power spectral density of your audio file, which is what most people want from the FFT, you'll graph 20*log10( sqrt( re^2 + im^2 ) ), using the first N/2 complex numbers of the FFT output, where N is the number of input samples to the FFT.
You can try to build your own spectrum analyzer software program, but I suggest using something that's already built and tested.
These two FFT spectrum analyzers give results instantly, and have built-in IFFT synthesis, meaning that you can inverse Fourier transform the frequency-domain spectral data to reconstruct the original signal in the time-domain.
http://www.mathworks.com/help/techdoc/ref/fft.html
http://www.sooeet.com/math/fft.php
There's a lot more to this topic, and to the subject of digital signal processing in general, but this brief introduction, should get you started.
In the theoretical sense, an FFT maps complex[N] => complex[N]. However, if your data is just an audio file, then your input will be simply complex numbers with no imaginary component. Thus you will map real[N] =>complex[N]. However, with a little math, you see that the format of the output will always be output[i]==complex_conjugate(output[N-i]). Thus you really only need to look at the first N/2+1 samples. Additionally, the complex output of the FFT gives you information about both phase and magnitude. If all you care about is how much of a certain frequency is in your audio, you only need to look at the magnitude, which can be calculated as square_root(imaginary^2+real^2), for each element of the output.
Of course, you'll need to look at the documentation of whatever library you use to understand which array element corresponds to the real part of the Nth complex output, and likewise to find the imaginary part of the Nth complex output.
As I remember FFT algorithm is not that complex, I used to write a Class of FFT calculation for my thesis. At that time the input is a 1D array of values which are read from the *.WAV files. But before FFT, there were some filtering and normalization performed.

Useful audio data from FFT

I'm trying to do a simple music visualization in java. I have two threads set up, one for playing the clip, and another for extracting a chunk of bytes from the clip to process with an FFT. The processed array can then be sent to the JFrame that will handle drawing, and used as a parameter for some sort of visual.
I'm not exactly sure what to do with the data, however. I've been just using a power spectrum for now, which gives me very limited response, and I realize is too general for what I am trying to do. I'm open to using any FFT library out there, if there is a specific one that will be especially helpful. But, in general, what can I get from my data after doing an FFT, and how can I use to show decently accurate results in the visuals?
All FFTs will do pretty much the same thing given the same data. The FFT parameters you can vary are the scale factor, the length of the FFT (longer will give you higher frequency resolution, shorter will give you better time response), and (pre)windowing the data, which will cause less "splatter" or spectral leakage of spectral peaks. You can zero-pad an FFT for interpolating smoother looking results. You can average the magnitude results of several successive FFTs to reduce the noise floor. You can also use a scaling function such a log scaling (or log log, e.g. log on both axis) for presenting the FFT magnitude results.
The phase of a complex FFT is usually unimportant for any visualization unless you are doing some type of phase vocoder analysis+resynthesis.

Categories