I am trying to extract frequency from a wav file, but looks like something is going wrong.
First of all I am extracting bytes from files, then applying FFT on it and at last finding the magnitude.
Seems like I am doing something wrong as the output is not close to real value.
Below is the code.
try{
File log = new File("files/log.txt");
if(!log.exists()) log.createNewFile();
PrintStream ps = new PrintStream(log);
File f = new File("files/5000.wav");
FileInputStream fis = new FileInputStream(f);
int length = (int)f.length();
length = (int)nearestPow2(length);
double[] ibr = new double[length]; //== real
double[] ibi = new double[length]; //== imaginary
int i = 0;
int l=0;
//fis.skip(44);
byte[] b = new byte[1024];
while((l=fis.read(b))!=-1){
try{
for(int j=0; j<1024; j++){
ibr[i] = b[j];
ibi[i] = 0;
i++;
}
}catch(Exception e){}
}
double[] ftb = FFTBase.fft(ibr, ibi, true);
double[] mag = new double[ftb.length/2];
double mxMag = 0;
long avgMg = 0;
int reqIndex = 512; //== no need to go till end
for(i=1;i<ibi.length; i++){
ibr[i] = ftb[i*2];
ibi[i] = ftb[i*2+1];
mag[i] = Math.sqrt(ibr[i]*ibr[i]+ibi[i]*ibi[i]);
avgMg += mag[i];
if(mag[i]>mxMag) mxMag = mag[i];
ps.println(mag[i]);
}
avgMg = avgMg/ibi.length;
ps.println("MAx===="+mxMag);
ps.println("Average===="+avgMg);
}catch(Exception e){e.printStackTrace();}
When I run this code for a 5KHZ file , these are the values I am getting.
https://pastebin.com/R3V0QU4G
This is not the complete output, but its somewhat similar.
Thanks
Extracting a frequency, or a "pitch" is unfortunatly hardly possible by only doing a fft and searching for the "loudest" frequency or something like that. At least if you are trying to extract it from a musical signal.
Also there are different kinds of tones. A large portion of musical instruments (i.e. a guitar or our voice) create harmonic sounds which consists of several frequencies which follow a certain pattern.
But there are also tones that have only one Peak / frequency (i.e. whistleing)
Additionally you usually have to deal with noise in the signal that is not tonal at all. This could be a background noise, or this could be produced by the instrument itself. Guitars for instance have a very large noise-portion while the attack-phase.
You can use different approaches, meaning different algorthims to find the pitch of these signals, depending of its type.
If we stay in the frequency domain (fft) and assuming we want to analyze a harmonic sound there is for example the two way mismatch algorithm that uses a statistical patternmatching to find harmonics and to guess the fundamental frequency, which is the frequency that is perceived as the tone by our ears.
An example-implementation can be found here: https://github.com/ausmauricio/audio_dsp This repo is part of a complete course on audio signal processing at coursera, maybe this is helpful.
Related
I'm new to the deeplearning4j library, but I've got some experience with neural networks in general.
I'm trying to train a recurrent neural network (a LSTM in particular) which is supposed to detect beats in music in realtime. All examples for using recurrent neural nets with deeplearning4j that I've found so far use a reader which reads the training data from a file. As I want to record music in realtime via a microphone, I can't read some pregenerated file, so the data which is fed into the neural network is generated in realtime by my application.
This is the code that I'm using to generate my network:
NeuralNetConfiguration.ListBuilder builder = new NeuralNetConfiguration.Builder()
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1)
.learningRate(0.1)
.rmsDecay(0.95)
.regularization(true)
.l2(0.001)
.weightInit(WeightInit.XAVIER)
.updater(Updater.RMSPROP)
.list();
int nextIn = hiddenLayers.length > 0 ? hiddenLayers[0] : numOutputs;
builder = builder.layer(0, new GravesLSTM.Builder().nIn(numInputs).nOut(nextIn).activation("softsign").build());
for(int i = 0; i < hiddenLayers.length - 1; i++){
nextIn = hiddenLayers[i + 1];
builder = builder.layer(i + 1, new GravesLSTM.Builder().nIn(hiddenLayers[i]).nOut(nextIn).activation("softsign").build());
}
builder = builder.layer(hiddenLayers.length, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MCXENT).nIn(nextIn).nOut(numOutputs).activation("softsign").build());
MultiLayerConfiguration conf = builder.backpropType(BackpropType.TruncatedBPTT).tBPTTForwardLength(DEFAULT_RECURRENCE_DEPTH).tBPTTBackwardLength(DEFAULT_RECURRENCE_DEPTH)
.pretrain(false).backprop(true)
.build();
net = new MultiLayerNetwork(conf);
net.init();
In this case I'm using about 700 inputs (which is mostly FFT-data of the recorded audio), 1 output (which is supposed to output a number between 0 [no beat] and 1 [beat]) and my hiddenLayers array consists of the ints {50, 25, 10}.
For getting the output of the network I'm using this code:
double[] output = new double[]{net.rnnTimeStep(Nd4j.create(netInputData)).getDouble(0)};
where netInputData is the data I want to input into the network as a one-dimensional double array.
I'm relatively sure that this code is working fine, since I get some output for an untrained network which looks something like this when I plot it.
However, once I try to train a network (even if I train it just for a short time, which should alter the weights of the network just a little bit, so that the output should be very similar to the untrained network), I get an output which looks like a constant.
This is the code which I'm using to train the network:
for(int timestep = 0; timestep < trainingData.length - DEFAULT_RECURRENCE_DEPTH; timestep++){
INDArray inputDataArray = Nd4j.create(new int[]{1, numInputs, DEFAULT_RECURRENCE_DEPTH},'f');
for(int inputPos = 0; inputPos < trainingData[timestep].length; inputPos++)
for(int inputTimeWindowPos = 0; inputTimeWindowPos < DEFAULT_RECURRENCE_DEPTH; inputTimeWindowPos++)
inputDataArray.putScalar(new int[]{0, inputPos, inputTimeWindowPos}, trainingData[timestep + inputTimeWindowPos][inputPos]);
INDArray desiredOutputDataArray = Nd4j.create(new int[]{1, numOutputs, DEFAULT_RECURRENCE_DEPTH},'f');
for(int outputPos = 0; outputPos < desiredOutputData[timestep].length; outputPos++)
for(int inputTimeWindowPos = 0; inputTimeWindowPos < DEFAULT_RECURRENCE_DEPTH; inputTimeWindowPos++)
desiredOutputDataArray.putScalar(new int[]{0, outputPos, inputTimeWindowPos}, desiredOutputData[timestep + inputTimeWindowPos][outputPos]);
net.fit(new DataSet(inputDataArray, desiredOutputDataArray));
}
Once again, I've got my data for the input and for the desired output as a double array. This time the two arrays are two-dimensional. The first index represents the time (where index 0 is the first audio data of the recorded audio) and the second index represents the input (or respectively the desired output) for this time step.
Given the shown output after training a network, I tend to think that there must be something wrong with my code used for creating the INDArrays from my data. Am I missing some important step for initializing these arrays or did I mess up the order I need to put my data into these arrays?
Thank you for any help in advance.
I'm not sure, but perhaps 99.99% of your training examples are 0, with only an occasional 1 exactly where the beat occurs. This might be too imbalanced to learn. Good luck.
I have a list of sampled data from the WAV file. I would like to pass in these values into a library and get the frequency of the music played in the WAV file. For now, I will have 1 frequency in the WAV file and I would like to find a library that is compatible with Android. I understand that I need to use FFT to get the frequency domain. Is there any good libraries for that? I found that [KissFFT][1] is quite popular but I am not very sure how compatible it is on Android. Is there an easier and good library that can perform the task I want?
EDIT:
I tried to use JTransforms to get the FFT of the WAV file but always failed at getting the correct frequency of the file. Currently, the WAV file contains sine curve of 440Hz, music note A4. However, I got the result as 441. Then I tried to get the frequency of G4, I got the result as 882Hz which is incorrect. The frequency of G4 is supposed to be 783Hz. Could it be due to not enough samples? If yes, how much samples should I take?
//DFT
DoubleFFT_1D fft = new DoubleFFT_1D(numOfFrames);
double max_fftval = -1;
int max_i = -1;
double[] fftData = new double[numOfFrames * 2];
for (int i = 0; i < numOfFrames; i++) {
// copying audio data to the fft data buffer, imaginary part is 0
fftData[2 * i] = buffer[i];
fftData[2 * i + 1] = 0;
}
fft.complexForward(fftData);
for (int i = 0; i < fftData.length; i += 2) {
// complex numbers -> vectors, so we compute the length of the vector, which is sqrt(realpart^2+imaginarypart^2)
double vlen = Math.sqrt((fftData[i] * fftData[i]) + (fftData[i + 1] * fftData[i + 1]));
//fd.append(Double.toString(vlen));
// fd.append(",");
if (max_fftval < vlen) {
// if this length is bigger than our stored biggest length
max_fftval = vlen;
max_i = i;
}
}
//double dominantFreq = ((double)max_i / fftData.length) * sampleRate;
double dominantFreq = (max_i/2.0) * sampleRate / numOfFrames;
fd.append(Double.toString(dominantFreq));
Can someone help me out?
EDIT2: I manage to fix the problem mentioned above by increasing the number of samples to 100000, however, sometimes I am getting the overtones as the frequency. Any idea how to fix it? Should I use Harmonic Product Frequency or Autocorrelation algorithms?
I realised my mistake. If I take more samples, the accuracy will increase. However, this method is still not complete as I still have some problems in obtaining accurate results for piano/voice sounds.
I am using a queue service that only takes messages as byte so I need to convert my data quickly to the format and then make it back to its original when I receive work from the queue. My data format is a INT, DOUBLE, and INT[] and here's how I did it at first
//to convert to string
String[] message = { Integer.toString(number), String.valueOf(double), Arrays.toString(my_list) };
message.asString;
//to convert back
String message_without_brackets = message.replace("[", "" ).replace("]", "");
String[] temp_message = message_without_brackets.split(",");
int interger = Integer.valueOf(temp_message[0]);
double double = Double.valueOf(temp_message[1]);
int[] my_list = new int[temp_message.length-2]; //-2 because the first two entries are other data
for (int i = 2; i < temp_message.length; i++) {
my_list[i-2] = Integer.parseInt(temp_message[i].replace(" ",""));
}
This is super ugly and it annoyed me that after a few weeks(or a single night of heavy drinking) I would probably not be able to figure out this quickly. Performance wise the code wasn't too bad, I think replace was the heaviest part of the code(if I remember it was like 15% of overall execution).
I asked around and found Gson to be able to do this cleaner but the performance is now over 40% of my loop now(its Gson itself thats doing it):
Gson gson = new Gson();
int[] sub = { 0, 59, 16 };
Object[] values = { 0, 43.0, sub };
String output = gson.toJson(values); // => [0, 43.0,[0,59,16]]
Object[] deserialized = gson.fromJson(output, Object[].class);
System.out.println(deserialized[0]);
System.out.println(deserialized[1]);
System.out.println(deserialized[2]);
So I'm wondering if there's faster way to get the same result?I am trying out a few of the suggestions in this question but is there a faster way to do this without depending on any external libraries as my needs are quite simple(if not, then is there a fast one)? Because someone suggested Gson, I looked at Json parsers, but is that what I should be looking for or is are there other types of libraries that do this?
EDIT: I am converting it to string because I thought I needed to do that to send it as getBytes(), is there any other format that would be faster that I can use getBytes() on?
You can use a DataOutputStream like
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
int i = 0;
double d = 43.0;
int[] sub = { 0, 59, 16 };
dos.writeInt(i);
dos.writeDouble(d);
dos.write(sub.length);
for (int j : sub)
dos.writeInt(j);
byte[] bytes = baos.toByteArray();
If you want to get more extreme you can use recycled ByteBuffers or even direct ByteBuffers and use native byte ordering. etc.
Have you taken a look at Guava? I use it all the time for handling byte streams
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/io/ByteStreams.html#toByteArray(java.io.InputStream)
the library is worth it just for toByteArray()
I have an array of audio data I am passing to a reader:
recorder.read(audioData,0,bufferSize);
The instantiation is as follows:
AudioRecord recorder;
short[] audioData;
int bufferSize;
int samplerate = 8000;
//get the buffer size to use with this audio record
bufferSize = AudioRecord.getMinBufferSize(samplerate, AudioFormat.CHANNEL_CONFIGURATION_MONO, AudioFormat.ENCODING_PCM_16BIT)*3;
//instantiate the AudioRecorder
recorder = new AudioRecord(AudioSource.MIC,samplerate, AudioFormat.CHANNEL_CONFIGURATION_MONO, AudioFormat.ENCODING_PCM_16BIT,bufferSize);
recording = true; //variable to use start or stop recording
audioData = new short [bufferSize]; //short array that pcm data is put into.
I have a FFT class I have found online and a complex class to go with it.
I have tried for two days looking online everywhere but cant work out how to loop through the values stored in audioData and pass it to the FFT.
This is the FFT class I am using: http://www.cs.princeton.edu/introcs/97data/FFT.java
and this is the complex class to go with it: http://introcs.cs.princeton.edu/java/97data/Complex.java.html
Assuming the audioData array contains the raw audio data, you need to create a Complex[] object from the audioData array as such:
Complex[] complexData = new Complex[audioData.length];
for (int i = 0; i < complexData.length; i++) {
complextData[i] = new Complex(audioData[i], 0);
}
Now you can pass your complexData object as a parameter to your FFT function:
Complex[] fftResult = FFT.fft(complexData);
Some of the details will depend on the purpose of your FFT.
The length of the FFT required depends on the frequency resolution and time accuracy (which are inversely related), that you wish in your analysis, which may or may not be anywhere near the length of an audio input buffer. Given those differences in length, you may have to combine multiple buffers, segment a single buffer, or some combination of the two, to get the FFT window length that meets your analysis requirements.
PCM is the technique of encoding data. It's not relevant to getting frequency analysis of audio data using FFT. If you use Java to decode PCM encoded data you will get raw audio data which can then be passed to your FFT library.
I need steps to perform document clustering using k-means algorithm in java.
It will be very useful for me to provide the steps easily.
Thanks in advance.
You need to count the words in each document and make a feature generally called bag of words. Before that you need to remove stop words(very common but not giving much information like the, a etc). You can generally take top n common words from your document. Count the frequency of these words and store them in n dimensional vector.
For distance measure you can use cosine vector.
Here is a simple algorithm for 2 mean for 1 dimensional data points. you can extend it to k mean and n dimensional data point easily. Let me know if you want n dim implementation.
double[] x = {1,2,2.5,3,3.5,4,4.5,5,7,8,8.5,9,9.5,10};
double[] center = new int[2];
double[] precenter = new int[2];
ArrayList[] cluster = new ArrayList[2];
//generate 2 random number from 0 to x.length without replacement
int rand = new int[2];
Random rand = new Random();
rand[0] = rand.nextInt(x.length + 1);
rand[1] = rand.nextInt(x.length + 1);
while(rand[0] == rand[1] ){
rand[1] = rand.nextInt(x.length + 1);
}
center[0] = x[rand[0]];
center[1] = x[rand[1]];
//there is a better way to generate k random number (w/o replacement) just search.
do{
cluster[0].clear();
cluster[1].clear();
for(int i = 0; i < x.length; ++i){
if(abs(x[i]-center1[0]) <= abs(x[i]-center1[1])){
cluster[0].add(x[i]);
}
else{
cluster[0].add(x[i]);
}
precenter[0] = center[0];
precenter[1] = center[1];
center[0] = mean(cluster[0]);
center[1] = mean(cluster[1]);
}
} while(precenter[0] != center[0] && precenter[1] != center[1]);
double mean(ArrayList list){
double mean = 0;
double sum = 0;
for(int index=0;index
}
The cluster[0] and cluster [1] contain points in the clusters and center[0], center[1] are the 2 means.
you need to do some debugging because I have written the code in R and just converted it into java for you :)
Does this help you? Also the wiki article has some links to implementations in other languages ready to be ported to java.
Steps of the algorithm:
Define the number of clusters you want to have
Distribute the points radomly in your problem space.
Link every observation to the nearest point.
calculate the center of mass for each cluster and place the point into the middle.
Link the points again to the centerpoints and repeat until the points dont move any more.
What do you want to cluster the documents based on? If it's by similarity you'll need to do some natural language processing first, and then you'll need a metric (some kind of assignment algorithm) to place the documents into clusters (crp works and is relatively straight forward).
The hardest part will be the NLP (language processing) if you're not clustering them based on something like "length". I can provide more info on all of these, but I won't dive down the rabbit hole if you don't need it.