How to interpret output from FFT on Noise library

How to interpret output from FFT on Noise library - java

I'm trying to get the most representative frequency (or first harmonic) from an audio file using the Noise FFT library (https://github.com/paramsen/noise). I have an array with the values of size x and the output array's size is x+2. I'm not familiar with Fourier Transform, so maybe I'm missing something, but from my understanding I should have something that represents the frequencies and stores the magnitude (or in this case a complex number from with to calculate it) of each one.
The thing is: since each position in the array should be a frequency, how can I know the range of the output frequencies, what frequency is each position or something like that?
Edit: This is part of the code I'm using
float[] mono = new float[size];
// I fill the array with the appropiate values
Noise noise = Noise.real(size);
float[] dst = new float[size + 2];
float[] fft = noise.fft(mono, dst);
// The result array has the pairs of real+imaginary floats in a one dimensional array; even indices
// are real, odd indices are imaginary. DC bin is located at index 0, 1, nyquist at index n-2, n-1
double greatest = 0;
int greatestIdx = 0;
for(int i = 0; i < fft.length / 2; i++) {
float real = fft[i * 2];
float imaginary = fft[i * 2 + 1];
double magnitude = Math.sqrt(real*real+imaginary*imaginary);
if (magnitude > greatest) {
greatest = magnitude;
greatestIdx = i;
}
System.out.printf("index: %d, real: %.5f, imaginary: %.5f\n", i, real, imaginary);
}
I just noticed something I had overlooked. When reading the comment just before the for loop (which is from the sample code provided in GitHub) it says that nyquist is located at the last pair of values of the array. From what I searched, nyquist is 22050Hz, so... To know the frequency corresponding to greatestIdx I should map the range [0,size+2] to the range [0,22050] and calculate the new value? It seems like a pretty unprecise measure.
Taking the prior things into account, maybe I should use another library for more precision? If that is the case, what would be one that let me specify the output frequency range or that gives me approximately the human hearing range by default?

I believe that the answer to your question is here if I understand it correctly https://stackoverflow.com/a/4371627/9834835
To determine the frequency for each FFT bin you may use the formula
F = i * sample / nFFt
where:
i = the FFT index
sample = the sample rate
nFft = your FFT size

Related

How to compute row-by-row/column-by-column multiplication?

I'm trying to understand how to implement the Fast Fourier Transform using the separability property.
Image from the book: Gonzalez, R. C., and R. E. Woods. "Digital Image Processing, 4th Global Edition."
According to this, as I understand, we must compute the matrix of complex sinewave values like this:
Complex[] F = new Complex[i.width*i.height];
for (int x=0;x<i.img.length;x++){
F[x] = new Complex();
double theta = -2 * Math.PI * (x/(double)i.img.length);
F[x] = new Complex(Math.cos(theta), Math.sin(theta));
}
So, our matrix is a 1D array with the size of our image.
And then we have to take an image and multiply it with row-by-row array F, and then take the processed image and multiply it with column-by-column array F.
What i can't understand is how to properly perform this row-by-row and column-by-column multiplication.
Let's say we take F and separate its columns by chunks. I did it like this:
int chunk = image.width;
for(int f=0;f<F.length;f+=chunk){
row = Arrays.copyOfRange(F, f, Math.min(F.length,f+chunk));
for(int k=0; k < row.length; k++){
row[k] = row[k].mul(image[k]);
clone[k] = (byte) row[k].r;
}
}
But this is wrong, the row gets empty.
Should we actually cut the 1D F array like this in order to multiply it with pixel values? Or there's another way to compute the row-by-row and column-by-column multiplication? How it can be implemented in Java?

How to get magnitude and corresponding frequency after FFT

How can I get the magnitudes and corresponding frequencies after performing FFT on a dataset. I need to plot the magnitude vs frequencies for a dataset. Also, why are we increasing the size of our FFT array as twice the size of actual dataset? Then the size of resulting output array is again different, Please help me understand this FFT code. Further, when is complexforward FFT and when realForward FFT is performed? Difference between the two? I need to perform FFT on a dataset and get the magnitude after FFT and corresponding frequencies for each magnitude.
int length = data.length;
FloatFFT_1D fftDo = new FloatFFT_1D(length);
float[] fft = new float[length * 2];
System.arraycopy(data, 0, fft, 0, length);
fftDo.complexForward(fft);
//for(double d: fft) {
//System.out.println(d);
//}
float outputfft[] = new float[(fft.length+1)/2];
if(fft.length%2==0){
for(int i = 0; i < length/2; i++){
outputfft[i]= (float) Math.sqrt((Math.pow(fft[2*i],2))+(Math.pow(fft[(2*(i))+1], 2)));
}
}else{
for(int i = 0; i < (length/2)+1; i++){
outputfft[i]= (float) Math.sqrt((Math.pow(fft[2*i],2))+(Math.pow(fft[(2*i)+1], 2)));
}
}
for (float f : outputfft) {
System.out.println(f);
}

The FFT of a real-valued data vector is by definition complex and symmetric. If you have a vector length of N samples you will get a FFT of N frequency data separated in frequency by Fs/N, where Fs is the sampling frequency. Your output vector is twice the size since the complex data is interleaved [re,im,re,im ...].
The output data is half the size since it is symmetric and you only need to view the first half corresponding to frequencies [0 .. Fs/2] the upper half is [-Fs/2 .. 0)
If you have an even-symmetric input data, X(-n)=X(n), or odd-symmetric, X(-n)=-X(n), you may use the realForward function

How to use DFT to approximate a function?

Im trying to implement something like discribed here and here, Specifically i want to be able to perform the following operation as in the following image :
That is, given N discrete points with constant time interval, i want to create a function that converges to those points as in the image...
So far what i did was :
imported jtransform
used it
private double[] doDFT(double[] data, int start, int end) {
DoubleFFT_1D doubleFFT_1D = new DoubleFFT_1D(end-start);
double[] array = new double[(end-start)*2];
for (int i=0;i<end-start;i++) {
array[i] = data[i+start];
array[i+1] = data[i+start+1];
}
doubleFFT_1D.complexForward(array);
return array;
}
and Now im stuck, how do i use the output array to produce the function that converges to the points in the original data array?
Just to clearify what i want : for example in the image the data array that is inputted to doDFT is the blue line plot, and what i want is to produce a function f that its image is the red line plot.

You probably want to set the imaginary component of your complex input to zero, not to the next point.
The functions you want are sinusoids. Each sinusoid will have a frequency of an FFT result bin index * Fs/N. The magnitude and phase of each sinusoid will be given by the complex value corresponding to its FFT result bin.
You can sum an increasing number of these sinusoids, starting with 1, to get your converging waveforms.

How to efficiently generate a set of unique random numbers with a predefined distribution?

I have a map of items with some probability distribution:
Map<SingleObjectiveItem, Double> itemsDistribution;
Given a certain m I have to generate a Set of m elements sampled from the above distribution.
As of now I was using the naive way of doing it:
while(mySet.size < m)
mySet.add(getNextSample(itemsDistribution));
The getNextSample(...) method fetches an object from the distribution as per its probability. Now, as m increases the performance severely suffers. For m = 500 and itemsDistribution.size() = 1000 elements, there is too much thrashing and the function remains in the while loop for too long. Generate 1000 such sets and you have an application that crawls.
Is there a more efficient way to generate a unique set of random numbers with a "predefined" distribution? Most collection shuffling techniques and the like are uniformly random. What would be a good way to address this?
UPDATE: The loop will call getNextSample(...) "at least" 1 + 2 + 3 + ... + m = m(m+1)/2 times. That is in the first run we'll definitely get a sample for the set. The 2nd iteration, it may be called at least twice and so on. If getNextSample is sequential in nature, i.e., goes through the entire cumulative distribution to find the sample, then the run time complexity of the loop is at least: n*m(m+1)/2, 'n' is the number of elements in the distribution. If m = cn; 0<c<=1 then the loop is at least Sigma(n^3). And that too is the lower bound!
If we replace sequential search by binary search, the complexity would be at least Sigma(log n * n^2). Efficient but may not be by a large margin.
Also, removing from the distribution is not possible since I call the above loop k times, to generate k such sets. These sets are part of a randomized 'schedule' of items. Hence a 'set' of items.

Start out by generating a number of random points in two dimentions.
Then apply your distribution
Now find all entries within the distribution and pick the x coordinates, and you have your random numbers with the requested distribution like this:

The problem is unlikely to be the loop you show:
Let n be the size of the distribution, and I be the number of invocations to getNextSample. We have I = sum_i(C_i), where C_i is the number of invocations to getNextSample while the set has size i. To find E[C_i], observe that C_i is the inter-arrival time of a poisson process with λ = 1 - i / n, and therefore exponentially distributed with λ. Therefore, E[C_i] = 1 / λ = therefore E[C_i] = 1 / (1 - i / n) <= 1 / (1 - m / n). Therefore, E[I] < m / (1 - m / n).
That is, sampling a set of size m = n/2 will take, on average, less than 2m = n invocations of getNextSample. If that is "slow" and "crawls", it is likely because getNextSample is slow. This is actually unsurprising, given the unsuitable way the distrubution is passed to the method (because the method will, of necessity, have to iterate over the entire distribution to find a random element).
The following should be faster (if m < 0.8 n)
class Distribution<T> {
private double[] cummulativeWeight;
private T[] item;
private double totalWeight;
Distribution(Map<T, Double> probabilityMap) {
int i = 0;
cummulativeWeight = new double[probabilityMap.size()];
item = (T[]) new Object[probabilityMap.size()];
for (Map.Entry<T, Double> entry : probabilityMap.entrySet()) {
item[i] = entry.getKey();
totalWeight += entry.getValue();
cummulativeWeight[i] = totalWeight;
i++;
}
}
T randomItem() {
double weight = Math.random() * totalWeight;
int index = Arrays.binarySearch(cummulativeWeight, weight);
if (index < 0) {
index = -index - 1;
}
return item[index];
}
Set<T> randomSubset(int size) {
Set<T> set = new HashSet<>();
while(set.size() < size) {
set.add(randomItem());
}
return set;
}
}
public class Test {
public static void main(String[] args) {
int max = 1_000_000;
HashMap<Integer, Double> probabilities = new HashMap<>();
for (int i = 0; i < max; i++) {
probabilities.put(i, (double) i);
}
Distribution<Integer> d = new Distribution<>(probabilities);
Set<Integer> set = d.randomSubset(max / 2);
//System.out.println(set);
}
}
The expected runtime is O(m / (1 - m / n) * log n). On my computer, a subset of size 500_000 of a set of 1_000_000 is computed in about 3 seconds.
As we can see, the expected runtime approaches infinity as m approaches n. If that is a problem (i.e. m > 0.9 n), the following more complex approach should work better:
Set<T> randomSubset(int size) {
Set<T> set = new HashSet<>();
while(set.size() < size) {
T randomItem = randomItem();
remove(randomItem); // removes the item from the distribution
set.add(randomItem);
}
return set;
}
To efficiently implement remove requires a different representation for the distribution, for instance a binary tree where each node stores the total weight of the subtree whose root it is.
But that is rather complicated, so I wouldn't go that route if m is known to be significantly smaller than n.

If you are not concerning with randomness properties too much then I do it like this:
create buffer for pseudo-random numbers
double buff[MAX]; // [edit1] double pseudo random numbers
MAX is size should be big enough ... 1024*128 for example
type can be any (float,int,DWORD...)
fill buffer with numbers
you have range of numbers x = < x0,x1 > and probability function probability(x) defined by your probability distribution so do this:
for (i=0,x=x0;x<=x1;x+=stepx)
for (j=0,n=probability(x)*MAX,q=0.1*stepx/n;j<n;j++,i++) // [edit1] unique pseudo-random numbers
buff[i]=x+(double(i)*q); // [edit1] ...
The stepx is your accuracy for items (for integral types = 1) now the buff[] array has the same distribution as you need but it is not pseudo-random. Also you should add check if j is not >= MAX to avoid array overruns and also at the end the real size of buff[] is j (can be less than MAX due to rounding)
shuffle buff[]
do just few loops of swap buff[i] and buff[j] where i is the loop variable and j is pseudo-random <0-MAX)
write your pseudo-random function
it just return number from the buffer. At first call returns the buff[0] at second buff[1] and so on ... For standard generators When you hit the end of buff[] then shuffle buff[] again and start from buff[0] again. But as you need unique numbers then you can not reach the end of buffer so so set MAX to be big enough for your task otherwise uniqueness will not be assured.
[Notes]
MAX should be big enough to store the whole distribution you want. If it is not big enough then items with low probability can be missing completely.
[edit1] - tweaked answer a little to match the question needs (pointed by meriton thanks)
PS. complexity of initialization is O(N) and for get number is O(1).

You should implement your own random number generator (using a MonteCarlo methode or any good uniform generator like mersen twister) and basing on the inversion method (here).
For example : exponential law: generate a uniform random number u in [0,1] then your random variable of the exponential law would be : ln(1-u)/(-lambda) lambda being the exponential law parameter and ln the natural logarithm.
Hope it'll help ;).

I think you have two problems:
Your itemDistribution doesn't know you need a set, so when the set you are building gets
large you will pick a lot of elements that are already in the set. If you start with the
set all full and remove elements you will run into the same problem for very small sets.
Is there a reason why you don't remove the element from the itemDistribution after you
picked it? Then you wouldn't pick the same element twice?
The choice of datastructure for itemDistribution looks suspicious to me. You want the
getNextSample operation to be fast. Doesn't the map from values to probability force you
to iterate through large parts of the map for each getNextSample. I'm no good at
statistics but couldn't you represent the itemDistribution the other way, like a map from
probability, or maybe the sum of all smaller probabilities + probability to a element
of the set?

Your performance depends on how your getNextSample function works. If you have to iterate over all probabilities when you pick the next item, it might be slow.
A good way to pick several unique random items from a list is to first shuffle the list and then pop items off the list. You can shuffle the list once with the given distribution. From then on, picking your m items ist just popping the list.
Here's an implementation of a probabilistic shuffle:
List<Item> prob_shuffle(Map<Item, int> dist)
{
int n = dist.length;
List<Item> a = dist.keys();
int psum = 0;
int i, j;
for (i in dist) psum += dist[i];
for (i = 0; i < n; i++) {
int ip = rand(psum); // 0 <= ip < psum
int jp = 0;
for (j = i; j < n; j++) {
jp += dist[a[j]];
if (ip < jp) break;
}
psum -= dist[a[j]];
Item tmp = a[i];
a[i] = a[j];
a[j] = tmp;
}
return a;
}
This in not Java, but pseudocude after an implementation in C, so please take it with a grain of salt. The idea is to append items to the shuffled area by continuously picking items from the unshuffled area.
Here, I used integer probabilities. (The proabilities don't have to add to a special value, it's just "bigger is better".) You can use floating-point numbers but because of inaccuracies, you might end up going beyond the array when picking an item. You should use item n - 1 then. If you add that saftey net, you could even have items with zero probability that always get picked last.
There might be a method to speed up the picking loop, but I don't really see how. The swapping renders any precalculations useless.

Accumulate your probabilities in a table
Probability
Item Actual Accumulated
Item1 0.10 0.10
Item2 0.30 0.40
Item3 0.15 0.55
Item4 0.20 0.75
Item5 0.25 1.00
Make a random number between 0.0 and 1.0 and do a binary search for the first item with a sum that is greater than your generated number. This item would have been chosen with the desired probability.

Ebbe's method is called rejection sampling.
I sometimes use a simple method, using an inverse cumulative distribution function, which is a function that maps a number X between 0 and 1 onto the Y axis.
Then you just generate a uniformly distributed random number between 0 and 1, and apply the function to it.
That function is also called the "quantile function".
For example, suppose you want to generate a normally distributed random number.
It's cumulative distribution function is called Phi.
The inverse of that is called probit.
There are many ways to generate normal variates, and this is just one example.
You can easily construct an approximate cumulative distribution function for any univariate distribution you like, in the form of a table.
Then you can just invert it by table-lookup and interpolation.

Creating an even amount of randomness in an array

Let's say that you have an arbitrarily large sized two-dimensional array with an even amount of items in it. Let's also assume for clarity that you can only choose between two things to put as a given item in the array. How would you go about putting a random choice at a given index in the array but once the array is filled you have an even split among the two choices?
If there are any answers with code, Java is preferred but other languages are fine as well.

You could basically think about it in the opposite way. Rather than deciding for a given index, which value to put in it, you could select n/2 elements from the array and place the first value in them. Then place the 2nd value in the other n/2.

A 2-D A[M,N] array can be mapped to a vector V[M*N] (you can use a row-major or a column-major order to do the mapping).
Start with a vector V[M*N]. Fill its first half with the first choice, and the second half of the array with the second choice object. Run a Fisher-Yates shuffle, and convert the shuffled array to a 2-D array. The array is now filled with elements that are evenly split among the two choices, and the choices at each particular index are random.

The below creates a List<T> the size of the area of the matrix, and fills it half with the first choice (spaces[0]) and half with the second (spaces[1]). Afterward, it applies a shuffle (namely Fisher-Yates, via Collections.shuffle) and begins to fill the matrix with these values.
static <T> void fill(final T[][] matrix, final T... space) {
final int w = matrix.length;
final int h = matrix[0].length;
final int area = w * h;
final List<T> sample = new ArrayList<T>(area);
final int half = area >> 1;
sample.addAll(Collections.nCopies(half, space[0]));
sample.addAll(Collections.nCopies(half, space[1]));
Collections.shuffle(sample);
final Iterator<T> cursor = sample.iterator();
for (int x = w - 1; x >= 0; --x) {
final T[] column = matrix[x];
for (int y = h - 1; y >= 0; --y) {
column[y] = cursor.next();
}
}
}

Pseudo-code:
int trues_remaining = size / 2;
int falses_remaining = size / 2;
while (trues_remaining + falses_remaining > 0)
{
if (trues_remaining > 0)
{
if (falses_remaining > 0)
array.push(getRandomBool());
else
array.push(true);
}
else
array.push(false);
}
Doesn't really scale to more than two values, though. How about:
assoc_array = { 1 = 4, 2 = 4, 3 = 4, 4 = 4 };
while (! assoc_array.isEmpty())
{
int index = rand(assoc_array.getNumberOfKeys());
int n = assoc_array.getKeyAtIndex(index);
array.push(n);
assoc_array[n]--;
if (assoc_array[n] <= 0) assoc_array.deleteKey(n);
}
EDIT: just noticed you asked for a two-dimensional array. Well it should be easy to adapt this approach to n-dimensional.
EDIT2: from your comment above, "school yard pick" is a great name for this.

It doesn't sound like your requirements for randomness are very strict, but I thought I'd contribute some more thoughts for anyone who may benefit from them.
You're basically asking for a pseudorandom binary sequence, and the most popular one I know of is the maximum length sequence. This uses a register of n bits along with a linear feedback shift register to define a periodic series of 1's and 0's that has a perfectly flat frequency spectrum. At least it is perfectly flat within certain bounds, determined by the sequence's period (2^n-1 bits).
What does that mean? Basically it means that the sequence is guaranteed to be maximally random across all shifts (and therefore frequencies) if its full length is used. When compared to an equal length sequence of numbers generated from a random number generator, it will contain MORE randomness per length than your typical randomly generated sequence.
It is for this reason that it is used to determine impulse functions in white noise analysis of systems, especially when experiment time is valuable and higher order cross effects are less important. Because the sequence is random relative to all shifts of itself, its auto-correlation is a perfect delta function (aside from qualifiers indicated above) so the stimulus does not contaminate the cross correlation between stimulus and response.
I don't really know what your application for this matrix is, but if it simply needs to "appear" random then this would do that very effectively. In terms of being balanced, 1's vs 0's, the sequence is guaranteed to have exactly one more 1 than 0. Therefore if you're trying to create a grid of 2^n, you would be guaranteed to get the correct result by tacking a 0 onto the end.
So an m-sequence is more random than anything you'll generate using a random number generator and it has a defined number of 0's and 1's. However, it doesn't allow for unqualified generation of 2d matrices of arbitrary size - only those where the total number of elements in the grid is a power of 2.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.