Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Determining a FFT(Fast Fourier Transformation) is often time-frequency trade off. If we calculate FFT over larger time(15 to 20 seconds), we can get more accurate frequencies but very less number of them. For smaller time intervals frequencies often show sharp variations.
What is the best time interval in which we can have a reliable data with sufficient number of frequency points?
That depends on your type of audio. Maybe you should consider using a Wavelet analysis. With this you can extract very accurate high frequencies without loosing the recognition of low frequencies.
have a read of the Nysquist-Shannon Sampling Theorem - sampling at 1/2B where B is the maximum frequency in the signal should give you the ability to reconstruct it without data loss.
This implies to me that you could get a good enough FFT for a signal by sampling for twice the period of the minimum frequency in the sample - that is, I guess, if the sample is periodic :P
"Best" depends on your needs.
If time resolution is more important then a smaller number of points may be better.
If frequency resolution is more important than a larger number of points may be better.
If the noise floor is high, then even more points may be better.
If you don't care about either, then use a short FFT to save on compute time and energy. (Or don't bother with the FFT and just take some results from /dev/random).
If the signals of interest are not stationary (e.g. they change over time), then you may want an FFT short enough to separate all the individual spectral events of interest each into their own FFT window.
If you need frequency resolution that can clearly separate a spectral signal peak from adjacent noise peaks, then you want an FFT length of around at least twice the sample rate divided by the minimum delta frequency between the peaks you want separated, more if you window. If you don't need this accuracy then a shorter FFT will do.
If the noise level and the interference is low enough compared to your signal(s) of interest, you might be able to interpolate frequency estimates from isolated peaks with a much shorter FFT, or even without an FFT. As little as 3 or 4 non-aliased sample points might do for a single sinusoid in zero noise.
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I’m testing a dot product program in Java with threaded (according to the number of system cores) and non-threaded options. The (very simple) algorithm divides the first (say A) matrix into as many matrices as cores and performs the dot product with the full B matrix. Once this is done, ‘chunks’ are collected obtaining the resulting matrix. When rows and columns are of a relatively high magnitude (say 20000 rows and 20000 columns) then threaded option works as expected, getting speeds 3.3-3.5 times faster than non-threading option with a standard four cores system.
But for a little number of columns in the first matrix (say, for instance, A[20000][20] multiplied by B[20][20000]) speeds from threaded and non-threaded options roughly equals. And for less columns, execution time under threaded condition is noticeably superior that under non-threaded option. This “equality frontier” varies according to the number of rows.
I understand the problems with overloaded threads system for little matrices, but I can’t fully understand this situation under a little number of columns only (columns in the first matrix factor, rows in the second). If the problem is of the same nature as that of the small matrices, is there a proportion between rows and columns that permits to choose a threaded or a non-threaded algorithm depending on the cases?
Generating a new thread always costs a (roughly constant) amount of computing time and before your program can benefit from it, the work per thread must take longer than this overhead.
I don't know you code (please post minimal examples in future, like the comment suggested), but you can look at the number of calculations (multiplacation + addition) each thread is evaluating at the equal calculation time. Based on this you should be able to approximate the size of the matrix (probably proportional to the product of side lengths = number of elements) where the multithreading is worth the trouble.
I want to see (in real time) the frequency (first harmonic) of the input signal (from a laptop microphone). I searched how to calculate FFT, I would need to see it in real time tho, is there any simple way to do this?
FFT is a perfectly legitimate way to measure frequency. You don't usually need to run in realtime because you only need to perform an FFT at the rate you want to update the frequency reading. Even so, modern computers can do FFT in better than realtime (IOW, you'll be I/O bound by the audio samples). One issue with FFT frequency measurements is that the FFT bins are equally spaced in frequency. That means that you'll have higher frequency resolution at high frequencies and lower at low frequencies. To measure low frequencies you need a really long FFT and to measure high frequencies you can use a really short one.
Another option is to use a frequency counter (counting zero crossings) but it has drawbacks if the signal is noisy or other signals are present.
Use sliding DFT as suggested in one of the stackoverflow answers on this: Doing FFT in realtime. There is a source code there as well. However, there is no easy and simple implementation of it, unless you encounter some already written code.
I want to implement Fast Fourier Transform in Java for chord recognition, but I don't really get it. It says that the number of samples should be a power of 2, so what should we do for a song that doesn't have number of samples equal to a power of 2? Also I would like to know about the STFT.
You normally generate an STFT over a sliding window throughout your file. The size of the window is chosen to give a reasonable time period over which the characteristics of the sound do not change greatly. Typically a window might be around 10 ms, so if your sample rate is 44.1kHz for example, then you might use a window size N = 512, so that you get the required duration and a power of 2 in size. You then take successive chunks of size N samples through the file, and generate the FFT for each N point chunk. (Note: in most cases you actually want the magnitude of the FFT output, in order to get an estimate of the power spectrum.) For increased resolution the chunks can overlap, e.g. by 50%, but this increases the processing load of course. The end result is a succession of short term spectra, so in effect you have a 3D matrix (amplitude v frequency v time) which describes the contents of the sound in the frequency domain.
Normally what you do is just pad the data with zeros to make it a power of two.
I have a file that contains a stream of raw PCM bytes. Is there a way to find out the sampling rate and bit depth just from the file itself? Some kind of an analyzer or something like that; I would just like to avoid trying to guess properties by playing the stream with random settings.
Assuming sane input, you can probably deduce bit depth / encoding by finding the possibility with the least high frequency noise.
Sample rate may be tricky, unless there is are noise components of expected frequency which could be detected (some research has been done on power line hum for example), or perhaps acoustical properties of a given recorder such as interference across the diameter of the microphone or of the casing which would shape the spectrum of the noise. Many sources may also use a consistent hardware sampling rate and convert it when others are desired, a process that may leave artifacts. But for a well done recording from unspecified hardware, it may indeed be challenging.
Another related challenge is distinguishing interleaved stereo from mono at twice the sample rate. This gets tricky because at low frequencies you'd expect the same content in both channels, while at high you'd expect a phase difference. But even in a mono track, you wouldn't expect low frequency components to be changing much between successive samples, while you would expect high frequency ones to do so. One idea might be to look for delayed (or advanced) correlation between possible left and right at high frequencies. Another might be to see if the phase difference between successive low frequency components comprised two individual monotonically spaced interleaved sequences with an unrelated difference between the two (stereo) or a single evenly spaced monotonic sequence (mono).
I need some help confirming some basic DSP steps. I'm in the process of implementing some smartphone accelerometer sensor signal processing software, but I've not worked in DSP before.
My program collects accelerometer data in real time at 32 Hz. The output should be the principal frequencies of the signal.
My specific questions are:
From the real-time stream, I am collecting a 256-sample window with 50% overlap, as I've read in the literature. That is, I add in 128 samples at a time to fill up a 256-sample window. Is this a correct approach?
The first figure below shows one such 256-sample window. The second figure shows the sample window after I applied a Hann/Hamming window function. I've read that applying a window function is a typical approach, so I went ahead and did it. Should I be doing so?
The third window shows the power spectrum (?) from the output of an FFT library. I am really cobbling together bits and pieces I've read. Am I correct in understanding that the spectrum goes up to 1/2 the sampling rate (in this case 16 Hz, since my sampling rate is 32 Hz), and the value of each spectrum point is spectrum[i] = sqrt(real[i]^2 + imaginary[i]^2)? Is this right?
Assuming what I did in question 3 is correct, is my understanding right that the third figure shows principal frequencies of about 3.25 Hz and 8.25 Hz? I know from collecting the data that I was running at about 3 Hz, so the spike at 3.25 Hz seems right. So there must be some noise other other factors causing the (erroneous) spike at 8.25 Hz. Are there any filters or other methods I can use to smooth away this and other spikes? If not, is there a way to determine "real" spikes from erroneous spikes?
Making a decision on sample size and overlap is always a compromise between frequency accuracy and timeliness: the bigger the sample, the more FFT bins and hence absolute accuracy, but it takes longer. I'm guessing you want regular updates on the frequency you're detecting, and absolute accuracy is not too important: so a 256 sample FFT seems a pretty good choice. Having an overlap will give a higher resolution on the same data, but at the expense of processing: again, 50% seems fine.
Applying a window will stop frequency artifacts appearing due to the abrupt start and finish of the sample (you are effectively applying a square window if you do nothing). A Hamming window is fairly standard as it gives a good compromise between having sharp signals and low side-lobes: some windows will reject the side-lobes better (multiples of the detected frequency) but the detected signal will be spread over more bins, and others the opposite. On a small sample size with the amount of noise you have on your signal, I don't think it really matters much: you might as well stick with a Hamming window.
Exactly right: the power spectrum is the square-root of the sum of the squares of the complex values. Your assumption about the Nyquist frequency is true: your scale will go up to 16Hz. I assume you are using a real FFT algorithm, which is returning 128 complex values (an FFT will give 256 values back, but because you are giving it a real signal, half will be an exact mirror image of the other), so each bin is 16/128 Hz wide. It is also common to show the power spectrum on a log scale, but that's irrelevant if you're just peak detecting.
The 8Hz spike really is there: my guess is that a phone in a pocket of a moving person is more than a 1st order system, so you are going to have other frequency components, but should be able to detect the primary. You could filter it out, but that's pointless if you are taking an FFT: just ignore those bins if you are sure they are erroneous.
You seem to be getting on fine. The only suggestion I would make is to develop some longer time heuristics on the results: look at successive outputs and reject short-term detected signals. Look for a principal component and see if you can track it as it moves around.
To answer a few of your questions:
Yes, you should be applying a window function. The idea here is that when you start and stop sampling a real-world signal, what you're doing anyway is applying a sharp rectangular window. Hann and Hamming windows are much better at reducing frequencies you don't want, so this is a good approach.
Yes, the strongest frequencies are around 3 and 8 Hz. I don't think the 8 Hz spike is erroneous. With such a short data set you almost certainly can't control the exact frequencies your signal will have.
Some insight on question 4 (from staring at accelerometer signals of people running for months of my life):
Are you running this analysis on a single accelerometer axis channel or are you combining them to create the magnitude of acceleration? If you are interested in the overall magnitude of acceleration of signal, then you should combine x y z such as mag_acc = sqrt((x - 0g_offset)^2 + (y - 0g_offset)^2 + (z - 0g_offset)^2). This signal should be at 1g when the device is still. If you are only looking at a single axis, then you will get components from the dominant running motion and also from the orientation of the phone changing contributing to your signal (because the contribution from gravity will be transitioning around). So if the phone orientation is moving around while you are running from how you are holding it, it can contribute a significant amount to the signal, but the magnitude will not show the orientation changes as much. A person running should have a really clean dominant frequency at the persons step rate.