Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I am working with android project.I need FFT algorithm to process the android accelerometer data.Is there FFT library available in android sdk?
You can use this class, which is fast enough for real time audio analysis
public class FFT {
int n, m;
// Lookup tables. Only need to recompute when size of FFT changes.
double[] cos;
double[] sin;
public FFT(int n) {
this.n = n;
this.m = (int) (Math.log(n) / Math.log(2));
// Make sure n is a power of 2
if (n != (1 << m))
throw new RuntimeException("FFT length must be power of 2");
// precompute tables
cos = new double[n / 2];
sin = new double[n / 2];
for (int i = 0; i < n / 2; i++) {
cos[i] = Math.cos(-2 * Math.PI * i / n);
sin[i] = Math.sin(-2 * Math.PI * i / n);
}
}
public void fft(double[] x, double[] y) {
int i, j, k, n1, n2, a;
double c, s, t1, t2;
// Bit-reverse
j = 0;
n2 = n / 2;
for (i = 1; i < n - 1; i++) {
n1 = n2;
while (j >= n1) {
j = j - n1;
n1 = n1 / 2;
}
j = j + n1;
if (i < j) {
t1 = x[i];
x[i] = x[j];
x[j] = t1;
t1 = y[i];
y[i] = y[j];
y[j] = t1;
}
}
// FFT
n1 = 0;
n2 = 1;
for (i = 0; i < m; i++) {
n1 = n2;
n2 = n2 + n2;
a = 0;
for (j = 0; j < n1; j++) {
c = cos[a];
s = sin[a];
a += 1 << (m - i - 1);
for (k = j; k < n; k = k + n2) {
t1 = c * x[k + n1] - s * y[k + n1];
t2 = s * x[k + n1] + c * y[k + n1];
x[k + n1] = x[k] - t1;
y[k + n1] = y[k] - t2;
x[k] = x[k] + t1;
y[k] = y[k] + t2;
}
}
}
}
}
Warning: this code appears to be derived from here, and has a GPLv2 license.
Using the class at: https://www.ee.columbia.edu/~ronw/code/MEAPsoft/doc/html/FFT_8java-source.html
Short explanation: call fft() providing x as you amplitude data, y as all-zeros array, after the function returns your first answer will be a[0]=x[0]^2+y[0]^2.
Complete explanation: FFT is complex transform, it takes N complex numbers and produces N complex numbers. So x[0] is the real part of the first number, y[0] is the complex part. This function computes in-place, so when the function returns x and y will have the real and complex parts of the transform.
One typical usage is to calculate the power spectrum of audio. Your audio samples only have real part, you your complex part is 0. To calculate the power spectrum you add the square of the real and complex parts P[0]=x[0]^2+y[0]^2.
Also it's important to notice that the Fourier transform, when applied over real numbers, result in symmetrical result (x[0]==x[x.lenth-1]). The data at x[x.length/2] have the data from frequency f=0Hz. x[0]==x[x.length-1] has the data for a frequency equals to have the sampling rate (eg if you sampling was 44000Hz than it means f[0] refeers to 22kHz).
Full procedure:
create array p[n] with 512 samples with zeros
Collect 1024 audio samples, write them on x
Set y[n]=0 for all n
calculate fft(x,y)
calculate p[n]+=x[n+512]^2+y[n+512]^2 for all n=0 to 512
to go 2 to take another batch (after 50 batches go to next step)
plot p
go to 1
Than adjust the fixed number for your taste.
The number 512 defines the sampling window, I won't explain it. Just avoid reducing it too much.
The number 1024 must be always the double of the last number.
The number 50 defines you update rate. If your sampling rate is 44000 samples per second you update rate will be: R=44000/1024/50 = 0.85 seconds.
kissfft is a decent enough library that compiles on android. It has a more versatile license than FFTW (even though FFTW is admittedly better).
You can find an android binding for kissfft in libgdx https://github.com/libgdx/libgdx/blob/0.9.9/extensions/gdx-audio/src/com/badlogic/gdx/audio/analysis/KissFFT.java
Or if you would like a pure Java based solution try jTransforms
https://sites.google.com/site/piotrwendykier/software/jtransforms
Use this class (the one that EricLarch's answer is derived from).
Usage Notes
This function replaces your inputs arrays with the FFT output.
Input
N = the number of data points (the size of your input array, must be a power of 2)
X = the real part of your data to be transformed
Y = the imaginary part of the data to be transformed
i.e. if your input is
(1+8i, 2+3j, 7-i, -10-3i)
N = 4
X = (1, 2, 7, -10)
Y = (8, 3, -1, -3)
Output
X = the real part of the FFT output
Y = the imaginary part of the FFT output
To get your classic FFT graph, you will want to calculate the magnitude of the real and imaginary parts.
Something like:
public double[] fftCalculator(double[] re, double[] im) {
if (re.length != im.length) return null;
FFT fft = new FFT(re.length);
fft.fft(re, im);
double[] fftMag = new double[re.length];
for (int i = 0; i < re.length; i++) {
fftMag[i] = Math.pow(re[i], 2) + Math.pow(im[i], 2);
}
return fftMag;
}
Also see this StackOverflow answer for how to get frequencies if your original input was magnitude vs. time.
Yes, there is the JTransforms that is maintained on github here and avaiable as a Maven plugin here.
Use with:
compile group: 'com.github.wendykierp', name: 'JTransforms', version: '3.1'
But with more recent, Gradle versions you need to use something like:
dependencies {
...
implementation 'com.github.wendykierp:JTransforms:3.1'
}
#J Wang
Your output magnitude seems better than the answer given on the thread you have linked however that is still magnitude squared ... the magnitude of a complex number
z = a + ib
is calculated as
|z|=sqrt(a^2+b^2)
the answer in the linked thread suggests that for pure real inputs the outputs
should be using a2 or a for the output because the values for
a_(i+N/2) = -a_(i),
with b_(i) = a_(i+N/2) meaning the complex part in their table is in the second
half of the output table.
i.e the second half of the output table for an input table of reals is the conjugate of the real ...
so z = a-ia giving a magnitude
|z|=sqrt(2a^2) = sqrt(2)a
so it is worth noting the scaling factors ...
I would recommend looking all this up in a book or on wiki to be sure.
Unfortunately the top answer only works for Array that its size is a power of 2, which is very limiting.
I used the Jtransforms library and it works perfectly, you can compare it to the function used by Matlab.
here is my code with comments referencing how matlab transforms any signal and gets the frequency amplitudes (https://la.mathworks.com/help/matlab/ref/fft.html)
first, add the following in the build.gradle (app)
implementation 'com.github.wendykierp:JTransforms:3.1'
and here it is the code for for transforming a simple sine wave, works like a charm
double Fs = 8000;
double T = 1/Fs;
int L = 1600;
double freq = 338;
double sinValue_re_im[] = new double[L*2]; // because FFT takes an array where its positions alternate between real and imaginary
for( int i = 0; i < L; i++)
{
sinValue_re_im[2*i] = Math.sin( 2*Math.PI*freq*(i * T) ); // real part
sinValue_re_im[2*i+1] = 0; //imaginary part
}
// matlab
// tf = fft(y1);
DoubleFFT_1D fft = new DoubleFFT_1D(L);
fft.complexForward(sinValue_re_im);
double[] tf = sinValue_re_im.clone();
// matlab
// P2 = abs(tf/L);
double[] P2 = new double[L];
for(int i=0; i<L; i++){
double re = tf[2*i]/L;
double im = tf[2*i+1]/L;
P2[i] = sqrt(re*re+im*im);
}
// P1 = P2(1:L/2+1);
double[] P1 = new double[L/2]; // single-sided: the second half of P2 has the same values as the first half
System.arraycopy(P2, 0, P1, 0, L/2);
// P1(2:end-1) = 2*P1(2:end-1);
System.arraycopy(P1, 1, P1, 1, L/2-2);
for(int i=1; i<P1.length-1; i++){
P1[i] = 2*P1[i];
}
// f = Fs*(0:(L/2))/L;
double[] f = new double[L/2 + 1];
for(int i=0; i<L/2+1;i++){
f[i] = Fs*((double) i)/L;
}
Related
I'm trying to minimise a value in Java usingcommons-math. I've had a look at their documentation but I don't really get how to implement it.
Basically, in my code below, I have a Double which has the expected goals in a soccer match and I'd like to optimise the probability value of under 3 goals occurring in a game to 0.5.
import org.apache.commons.math3.distribution.PoissonDistribution;
public class Solver {
public static void main(String[] args) {
final Double expectedGoals = 2.9d;
final PoissonDistribution poissonGoals = new PoissonDistribution(expectedGoals);
Double probabilityUnderThreeGoals = 0d;
for (int score = 0; score < 15; score++) {
final Double probability =
poissonGoals.probability(score);
if (score < 3) {
probabilityUnderThreeGoals = probabilityUnderThreeGoals + probability;
}
}
System.out.println(probabilityUnderThreeGoals); //prints 0.44596319855718064, I want to optimise this to 0.5
}
}
The cumulative probability (<= x) of a Poisson random variable can be calculated by:
In your case, x is 2 and you want to find lambda (the mean) such that this is 0.5. You can type this into WolframAlpha and have it solve it for you. So rather than an optimisation problem, this is just a root-finding problem (though one could argue that optimisation problems are just finding roots.)
You can also do this with Apache Commons Maths, with one of the root finders.
int maximumGoals = 2;
double expectedProbability = 0.5;
UnivariateFunction f = x -> {
double sum = 0;
for (int i = 0; i <= maximumGoals; i++) {
sum += Math.pow(x, i) / CombinatoricsUtils.factorialDouble(i);
}
return sum * Math.exp(-x) - expectedProbability;
};
// the four parameters that "solve" takes are:
// the number of iterations, the function to solve, min and max of the root
// I've put some somewhat sensible values as an example. Feel free to change them
double answer = new BisectionSolver().solve(Integer.MAX_VALUE, f, 0, maximumGoals / expectedProbability);
System.out.println("Solved: " + answer);
System.out.println("Cumulative Probability: " + new PoissonDistribution(answer).cumulativeProbability(maximumGoals));
This prints:
Solved: 2.674060344696045
Cumulative Probability: 0.4999999923623868
I need to get the amplitude of a signal at a certain frequency.
I use FFTAnalysis function. But I get all spectrum. How can I modify this for get the amplitude of a signal at a certain frequency?
For example I have:
data = array of 1024 points;
If I use FFTAnalysis I get FFTdata array of 1024 points.
But I need only FFTdata[454] for instance ();
public static float[] FFTAnalysis(short[] AVal, int Nvl, int Nft) {
double TwoPi = 6.283185307179586;
int i, j, n, m, Mmax, Istp;
double Tmpr, Tmpi, Wtmp, Theta;
double Wpr, Wpi, Wr, Wi;
double[] Tmvl;
float[] FTvl;
n = Nvl * 2;
Tmvl = new double[n];
FTvl = new float[Nvl];
for (i = 0; i < Nvl; i++) {
j = i * 2; Tmvl[j] = 0; Tmvl[j+1] = AVal[i];
}
i = 1; j = 1;
while (i < n) {
if (j > i) {
Tmpr = Tmvl[i]; Tmvl[i] = Tmvl[j]; Tmvl[j] = Tmpr;
Tmpr = Tmvl[i+1]; Tmvl[i+1] = Tmvl[j+1]; Tmvl[j+1] = Tmpr;
}
i = i + 2; m = Nvl;
while ((m >= 2) && (j > m)) {
j = j - m; m = m >> 1;
}
j = j + m;
}
Mmax = 2;
while (n > Mmax) {
Theta = -TwoPi / Mmax; Wpi = Math.sin(Theta);
Wtmp = Math.sin(Theta / 2); Wpr = Wtmp * Wtmp * 2;
Istp = Mmax * 2; Wr = 1; Wi = 0; m = 1;
while (m < Mmax) {
i = m; m = m + 2; Tmpr = Wr; Tmpi = Wi;
Wr = Wr - Tmpr * Wpr - Tmpi * Wpi;
Wi = Wi + Tmpr * Wpi - Tmpi * Wpr;
while (i < n) {
j = i + Mmax;
Tmpr = Wr * Tmvl[j] - Wi * Tmvl[j-1];
Tmpi = Wi * Tmvl[j] + Wr * Tmvl[j-1];
Tmvl[j] = Tmvl[i] - Tmpr; Tmvl[j-1] = Tmvl[i-1] - Tmpi;
Tmvl[i] = Tmvl[i] + Tmpr; Tmvl[i-1] = Tmvl[i-1] + Tmpi;
i = i + Istp;
}
}
Mmax = Istp;
}
for (i = 0; i < Nft; i++) {
j = i * 2; FTvl[Nft - i - 1] = (float) Math.sqrt((Tmvl[j]*Tmvl[j]) + (Tmvl[j+1]*Tmvl[j+1]));
}
return FTvl;
}
The Goertzel algorithm (or filter) is similar to computing the magnitude for just 1 bin of an FFT.
The Goertzel algorithm is identical to 1 bin of an FFT, except for numerical artifacts, if the period of the frequency is an exact submultiple of your Goertzel filter's length. Otherwise there are some added scalloping effects from a rectangular window of non-periodic-in-aperture size, and how that window relates to the phase of the input.
Multiplying by a complex sinusoid and taking the magnitude of the complex sum is also computationally similar to a Goertzel, except the Goertzel does not require separately calling (or looking up) a trig library function every point, as it usually includes a trig recursion at part of its algorithm.
You'd multiply a (complex) sine wave on the input data, and integrate the result.
Multiplying with a complex sine is equal to a frequency shift, you want to shift the target frequency down to 0 Hz. The integration is a low pass filtering step, with the bandwidth being the inverse of the sampling length.
You then end up with a complex number, which is the same number you would have found in the FFT bin for this frequency (because in essence this is what the FFT does).
The fast fourier transform (FFT) is a clever way of doing many discrete fourier transforms very quickly. As such, the FFT is designed for when one needs a lot of frequencies from the input. If you want just one frequency, the DFT is the way to go (as otherwise you're wasting resources).
The DFT is defined as:
So, in pseudocode:
samples = [#,#,#,#...]
FREQ = 440; // frequency to detect
PI = 3.14159;
E = 2.718;
DFT = 0i; // this is a complex number
for(int sampleNum=0; sampleNum<N; sampleNum++){
DFT += samples[sampleNum] * E^( (-2*PI*1i*N) / N ); //Note that "i" here means imaginary
}
The resulting variable DFT will be a complex number representing the real and imaginary values of the chosen frequency.
I am trying to write a simple band pass filter following the instructions in this book. My code creates a blackman window, and combines two low pass filter kernels to create a band pass filter kernel using spectral inversion, as described in the second example here (table 16-2).
I am testing my code by comparing it with the results I get in matlab. When I test the methods that create a blackman window and a low pass filter kernel separately, I get results that are close to what I see in matlab (up to some digits after the decimal point - I attribute the error to java double variables rounding issues), but my band pass filter kernel is incorrect.
Tests I ran:
Created a blackman window and compared it with what I get in matlab - all good.
Created a low pass filter using this window using my code and fir1(N, Fc1/(Fs/2), win, flag); in matlab (see full code below). I think the results are correct, although I get bigger error the bigger Fc1 is (why?)
Created a pand pass filter using my code and fir1(N, [Fc1 Fc2]/(Fs/2), 'bandpass', win, flag); in matlab - results are completely off.
Filtered my data using my code and the kernel generated by matlab - all good.
So - why is my band pass filter kernel off? What did I do wrong?
I think I either have a bug or fir1 uses a different algorithm, but I can't check because the article referenced in its documentation is not publicly available.
This is my matlab code:
Fs = 200; % Sampling Frequency
N = 10; % Order
Fc1 = 1.5; % First Cutoff Frequency
Fc2 = 7.5; % Second Cutoff Frequency
flag = 'scale'; % Sampling Flag
% Create the window vector for the design algorithm.
win = blackman(N+1);
% Calculate the coefficients using the FIR1 function.
b = fir1(N, [Fc1 Fc2]/(Fs/2), 'bandpass', win, flag);
Hd = dfilt.dffir(b);
res = filter(Hd, data);
This is my java code (I believe the bug is in bandPassKernel):
/**
* See - http://www.mathworks.com/help/signal/ref/blackman.html
* #param length
* #return
*/
private static double[] blackmanWindow(int length) {
double[] window = new double[length];
double factor = Math.PI / (length - 1);
for (int i = 0; i < window.length; ++i) {
window[i] = 0.42d - (0.5d * Math.cos(2 * factor * i)) + (0.08d * Math.cos(4 * factor * i));
}
return window;
}
private static double[] lowPassKernel(int length, double cutoffFreq, double[] window) {
double[] ker = new double[length + 1];
double factor = Math.PI * cutoffFreq * 2;
double sum = 0;
for (int i = 0; i < ker.length; i++) {
double d = i - length/2;
if (d == 0) ker[i] = factor;
else ker[i] = Math.sin(factor * d) / d;
ker[i] *= window[i];
sum += ker[i];
}
// Normalize the kernel
for (int i = 0; i < ker.length; ++i) {
ker[i] /= sum;
}
return ker;
}
private static double[] bandPassKernel(int length, double lowFreq, double highFreq) {
double[] ker = new double[length + 1];
double[] window = blackmanWindow(length + 1);
// Create a band reject filter kernel using a high pass and a low pass filter kernel
double[] lowPass = lowPassKernel(length, lowFreq, window);
// Create a high pass kernel for the high frequency
// by inverting a low pass kernel
double[] highPass = lowPassKernel(length, highFreq, window);
for (int i = 0; i < highPass.length; ++i) highPass[i] = -highPass[i];
highPass[length / 2] += 1;
// Combine the filters and invert to create a bandpass filter kernel
for (int i = 0; i < ker.length; ++i) ker[i] = -(lowPass[i] + highPass[i]);
ker[length / 2] += 1;
return ker;
}
private static double[] filter(double[] signal, double[] kernel) {
double[] res = new double[signal.length];
for (int r = 0; r < res.length; ++r) {
int M = Math.min(kernel.length, r + 1);
for (int k = 0; k < M; ++k) {
res[r] += kernel[k] * signal[r - k];
}
}
return res;
}
And this is how I use my code:
double[] kernel = bandPassKernel(10, 1.5d / (200/2), 7.5d / (200/2));
double[] res = filter(data, kernel);
I ended up implementing Matlab's fir1 function in Java. My results are quite accurate.
I'm currently working on Java for Android. I try to implement the FFT in order to realize a kind of viewer of the frequencies.
Actually I was able to do it, but the display is not fluid at all.
I added some traces in order to check the treatment time of each part of my code, and the fact is that the FFT takes about 300ms to be applied on my complex array, that owns 4096 elements. And I need it to take less than 100ms, as my thread (that displays the frequencies) is refreshed every 100ms. I reduced the initial array in order that the FFT results own only 1028 elements, and it works, but the result is deprecated.
Does someone have an idea ?
I used the default fft.java and Complex.java classes that can be found on the internet.
For information, my code computing the FFT is the following :
int bytesPerSample = 2;
Complex[] x = new Complex[bufferSize/2] ;
for (int index = 0 ; index < bufferReadResult - bytesPerSample + 1; index += bytesPerSample)
{
// 16BITS = 2BYTES
float asFloat = Float.intBitsToFloat(asInt);
double sample = 0;
for (int b = 0; b < bytesPerSample; b++) {
int v = buffer[index + b];
if (b < bytesPerSample - 1 || bytesPerSample == 1) {
v &= 0xFF;
}
sample += v << (b * 8);
}
double sample32 = 100 * (sample / 32768.0); // don't know the use of this compute...
x[index/bytesPerSample] = new Complex(sample32, 0);
}
Complex[] tx = new Complex[1024]; // size = 2048
///// reduction of the size of the signal in order to improve the fft traitment time
for (int i = 0; i < x.length/4; i++)
{
tx[i] = new Complex(x[i*4].re(), 0);
}
// Signal retrieval thanks to the FFT
fftRes = FFT.fft(tx);
I don't know Java, but you're way of converting between your input data and an array of complex values seems very convoluted. You're building two arrays of complex data where only one is necessary.
Also it smells like your complex real and imaginary values are doubles. That's way over the top for what you need, and ARMs are veeeery slow at double arithmetic anyway. Is there a complex class based on single precision floats?
Thirdly you're performing a complex fft on real data by filling the imaginary part of your complexes with zero. Whilst the result will be correct it is twice as much work straight off (unless the routine is clever enough to spot that, which I doubt). If possible perform a real fft on your data and save half your time.
And then as Simon says there's the whole issue of avoiding garbage collection and memory allocation.
Also it looks like your FFT has no preparatory step. This mean that the routine FFT.fft() is calculating the complex exponentials every time. The longest part of the FFT calculation is working out the complex exponentials, which is a shame because for any given FFT length the exponentials are constants. They don't depend on your input data at all. In the real time world we use FFT routines where we calculate the exponentials once at the start of the program and then the actual fft itself takes that const array as one of its inputs. Don't know if your FFT class can do something similar.
If you do end up going to something like FFTW then you're going to have to get used to calling C code from your Java. Also make sure you get a version that supports (I think) NEON, ARM's answer to SSE, AVX and Altivec. It's worth ploughing through their release notes to check. Also I strongly suspect that FFTW will only be able to offer a significant speed up if you ask it to perform an FFT on single precision floats, not doubles.
Google luck!
--Edit--
I meant of course 'good luck'. Give me a real keyboard quick, these touchscreen ones are unreliable...
First, thanks for all your answers.
I followed them and made two test :
first one, I replace the double used in my Complex class by float. The result is just a bit better, but not enough.
then I've rewroten the fft method in order not to use Complex anymore, but a two-dimensional float array instead. For each row of this array, the first column contains the real part, and the second one the imaginary part.
I also changed my code in order to instanciate the float array only once, on the onCreate method.
And the result... is worst !! Now it takes a little bit more than 500ms instead of 300ms.
I don't know what to do now.
You can find below the initial fft fonction, and then the one I've re-wroten.
Thanks for your help.
// compute the FFT of x[], assuming its length is a power of 2
public static Complex[] fft(Complex[] x) {
int N = x.length;
// base case
if (N == 1) return new Complex[] { x[0] };
// radix 2 Cooley-Tukey FFT
if (N % 2 != 0) { throw new RuntimeException("N is not a power of 2 : " + N); }
// fft of even terms
Complex[] even = new Complex[N/2];
for (int k = 0; k < N/2; k++) {
even[k] = x[2*k];
}
Complex[] q = fft(even);
// fft of odd terms
Complex[] odd = even; // reuse the array
for (int k = 0; k < N/2; k++) {
odd[k] = x[2*k + 1];
}
Complex[] r = fft(odd);
// combine
Complex[] y = new Complex[N];
for (int k = 0; k < N/2; k++) {
double kth = -2 * k * Math.PI / N;
Complex wk = new Complex(Math.cos(kth), Math.sin(kth));
y[k] = q[k].plus(wk.times(r[k]));
y[k + N/2] = q[k].minus(wk.times(r[k]));
}
return y;
}
public static float[][] fftf(float[][] x) {
/**
* x[][0] = real part
* x[][1] = imaginary part
*/
int N = x.length;
// base case
if (N == 1) return new float[][] { x[0] };
// radix 2 Cooley-Tukey FFT
if (N % 2 != 0) { throw new RuntimeException("N is not a power of 2 : " + N); }
// fft of even terms
float[][] even = new float[N/2][2];
for (int k = 0; k < N/2; k++) {
even[k] = x[2*k];
}
float[][] q = fftf(even);
// fft of odd terms
float[][] odd = even; // reuse the array
for (int k = 0; k < N/2; k++) {
odd[k] = x[2*k + 1];
}
float[][] r = fftf(odd);
// combine
float[][] y = new float[N][2];
double kth, wkcos, wksin ;
for (int k = 0; k < N/2; k++) {
kth = -2 * k * Math.PI / N;
//Complex wk = new Complex(Math.cos(kth), Math.sin(kth));
wkcos = Math.cos(kth) ; // real part
wksin = Math.sin(kth) ; // imaginary part
// y[k] = q[k].plus(wk.times(r[k]));
y[k][0] = (float) (q[k][0] + wkcos * r[k][0] - wksin * r[k][1]);
y[k][1] = (float) (q[k][1] + wkcos * r[k][1] + wksin * r[k][0]);
// y[k + N/2] = q[k].minus(wk.times(r[k]));
y[k + N/2][0] = (float) (q[k][0] - (wkcos * r[k][0] - wksin * r[k][1]));
y[k + N/2][1] = (float) (q[k][1] - (wkcos * r[k][1] + wksin * r[k][0]));
}
return y;
}
actually I think I don't understand everything.
First, about Math.cos and Math.sin : how do you want me not to compute it each time ? Do you mean that I should instanciate the whole values only once (e.g store it in an array) and use them for each compute ?
Second, about the N % 2, indeed it's not very useful, I could make the test before the call of the function.
Third, about Simon's advice : I mixed what he said and what you said, that's why I've replaced the Complex by a two-dimensional float[][]. If that was not what he suggested, then what was it ?
At least, I'm not a FFT expert, so what do you mean by making a "real FFT" ? Do you mean that my imaginary part is useless ? If so, I'm not sure, because later in my code, I compute the magnitude of each frequence, so sqrt(real[i]*real[i] + imag[i]*imag[i]). And I think that my imaginary part is not equal to zero...
thanks !
I need to find a median value of an array of doubles (in Java) without modifying it (so selection is out) or allocating a lot of new memory. I also don't care to find the exact median, but within 10% is fine (so if median splits the sorted array 40%-60% it's fine).
How can I achieve this efficiently?
Taking into account suggestions from rfreak, ILMTitan and Peter I wrote this code:
public static double median(double[] array) {
final int smallArraySize = 5000;
final int bigArraySize = 100000;
if (array.length < smallArraySize + 2) { // small size, so can just sort
double[] arr = array.clone();
Arrays.sort(arr);
return arr[arr.length / 2];
} else if (array.length > bigArraySize) { // large size, don't want to make passes
double[] arr = new double[smallArraySize + 1];
int factor = array.length / arr.length;
for (int i = 0; i < arr.length; i++)
arr[i] = array[i * factor];
return median(arr);
} else { // average size, can sacrifice time for accuracy
final int buckets = 1000;
final double desiredPrecision = .005; // in percent
final int maxNumberOfPasses = 10;
int[] histogram = new int[buckets + 1];
int acceptableMin, acceptableMax;
double min, max, range, scale,
medianMin = -Double.MAX_VALUE, medianMax = Double.MAX_VALUE;
int sum, numbers, bin, neighborhood = (int) (array.length * 2 * desiredPrecision);
for (int r = 0; r < maxNumberOfPasses; r ++) { // enter search for number around median
max = -Double.MAX_VALUE; min = Double.MAX_VALUE;
numbers = 0;
for (int i = 0; i < array.length; i ++)
if (array[i] > medianMin && array[i] < medianMax) {
if (array[i] > max) max = array[i];
if (array[i] < min) min = array[i];
numbers ++;
}
if (min == max) return min;
if (numbers <= neighborhood) return (medianMin + medianMax) / 2;
acceptableMin = (int) (numbers * (50d - desiredPrecision) / 100);
acceptableMax = (int) (numbers * (50d + desiredPrecision) / 100);
range = max - min;
scale = range / buckets;
for (int i = 0; i < array.length; i ++)
histogram[(int) ((array[i] - min) / scale)] ++;
sum = 0;
for (bin = 0; bin <= buckets; bin ++) {
sum += histogram[bin];
if (sum > acceptableMin && sum < acceptableMax)
return ((.5d + bin) * scale) + min;
if (sum > acceptableMax) break; // one bin has too many values
}
medianMin = ((bin - 1) * scale) + min;
medianMax = (bin * scale) + min;
for (int i = 0; i < histogram.length; i ++)
histogram[i] = 0;
}
return .5d * medianMin + .5d * medianMax;
}
}
Here I take into account the size of the array. If it's small, then just sort and get the true median. If it's very large, sample it and get the median of the samples, and otherwise iteratively bin the values and see if the median can be narrowed down to an acceptable range.
I don't have any problems with this code. If someone sees something wrong with it, please let me know.
Thank you.
Assuming you mean median and not average. Also assuming you are working with fairly large double[], or memory wouldn't be an issue for sorting a copy and performing an exact median. ...
With minimal additional memory overhead you could probably run a O(n) algorithm that would get in the ballpark. I'd try this and see how accurate it is.
Two passes.
First pass find the min and max. Create a set of buckets that represent evenly spaced number ranges between the min and max. Make a second pass and "count" how many numbers fall in each bin. You should then be able to make a reasonable estimate of the median. Using 1000 buckets would only cost 4k if you use int[] to store the buckets. The math should be fast.
The only question is accuracy, and I think you should be able to tune the number of buckets to get in the error range for your data sets.
I'm sure someone with a better math/stats background than I could provide a precise size to get the error range you are looking for.
Pick a small number of array elements at random, and find the median of those.
Following on from the OPs question about; how to extract N values from a much larger array.
The following code shows how long it takes to find the median of a large array and then shows how long it take to find the median of a fixed size selection of values. The fixed size selection has a fixed cost, but is increasingly inaccurate as the the size of the original array grows.
The following prints
Avg time 17345 us. median=0.5009231700563378
Avg time 24 us. median=0.5146687617507585
the code
double[] nums = new double[100 * 1000 + 1];
for (int i = 0; i < nums.length; i++) nums[i] = Math.random();
{
int runs = 200;
double median = 0;
long start = System.nanoTime();
for (int r = 0; r < runs; r++) {
double[] arr = nums.clone();
Arrays.sort(arr);
median = arr[arr.length / 2];
}
long time = System.nanoTime() - start;
System.out.println("Avg time " + time / 1000 / runs + " us. median=" + median);
}
{
int runs = 20000;
double median = 0;
long start = System.nanoTime();
for (int r = 0; r < runs; r++) {
double[] arr = new double[301]; // fixed size to sample.
int factor = nums.length / arr.length; // take every nth value.
for (int i = 0; i < arr.length; i++)
arr[i] = nums[i * factor];
Arrays.sort(arr);
median = arr[arr.length / 2];
}
long time = System.nanoTime() - start;
System.out.println("Avg time " + time / 1000 / runs + " us. median=" + median);
}
To meet your requirement of not creating objects, I would put the fixed size array in a ThreadLocal so there is no ongoing object creation. You adjust the size of the array to suit how fast you want the function to be.
1) How much is a lot of new memory? Does it preclude a sorted copy of the data, or of references to the data?
2) Is your data repetitive (are there many distinct values)? If yes, then your answer to (1) is less likely to cause problems, because you may be able to do something with a lookup map and an array: e.g. Map and an an array of short and a suitably tweaked comparison object.
3) The typical case for the your "close to the mean" approximation is more likely to be O(n.log(n)). Most sort algorithms only degrade to O(n^2) with pathological data. Additionally, the exact median is only going to be (typically) O(n.log(n)), assuming you can afford a sorted copy.
4) Random sampling (a-la dan04) is more likely to be accurate than choosing values near the mean, unless your distribution is well behaved. For example poisson distribution and log normal both have different medians to means.