I am working on extracting MFCC features from some audio files. The program I have currently extracts a series of MFCCs for each file and has a parameter of a buffer size of 1024. I saw the following in a paper:
The feature vectors extracted within a second of audio data are combined by computing the mean and the variance of each feature vector element (merging).
My current code uses TarsosDSP to extract the MFCCs, but I'm not sure how to split the data into "a second of audio data" in order to merge the MFCCs.
My MFCC extraction code
int sampleRate = 44100;
int bufferSize = 1024;
int bufferOverlap = 512;
inStream = new FileInputStream(path);
AudioDispatcher dispatcher = new AudioDispatcher(new UniversalAudioInputStream(inStream, new TarsosDSPAudioFormat(sampleRate, 16, 1, true, true)), bufferSize, bufferOverlap);
final MFCC mfcc = new MFCC(bufferSize, sampleRate, 13, 40, 300, 3000);
dispatcher.addAudioProcessor(mfcc);
dispatcher.addAudioProcessor(new AudioProcessor() {
#Override
public void processingFinished() {
System.out.println("DONE");
}
#Override
public boolean process(AudioEvent audioEvent) {
return true; // breakpoint here reveals MFCC data
}
});
dispatcher.run();
What exactly is buffer size and could it be used to segment the audio into windows of 1 second? Is there a method to divide the series of MFCCs into certain amounts of time?
Any help would be greatly appreciated.
After more research, I came across this website that clearly showed steps in using MFCCs for Weka. It showed some data files with various statistics each listed as separate attributes in Weka. I believe when the paper said
computing the mean and variance
they meant the mean and variance of each MFCC coefficient were used as attributes in the combined data file. When I followed the example on the website to merge the MFCCs, I used max, min, range, max position, min position, mean, standard deviation, skewness, kurtosis, quartile, and interquartile range.
To split the audio input into seconds, I believe sets are of MFCCs are extracted at the sample rate inputted as the parameter, so if I set it to 100, I would wait for 100 cycles to merge the MFCCs. Please correct me if I'm wrong.
Related
I'm likely dense but I cannot seem to find a solution to my issue
(NOTE: I CAN find lots of people reporting this issue, seems like it happened as a result of newer Java (possible 1.5?). Perhaps SAMPLE_RATE is no longer supported? I am unable to find any solution).
I'm trying to adjust the SAMPLE_RATE to speed up/slow down song. I can successfully play a .wav file without issue, so I looked into FloatControl which worked for adjusting volume:
public void adjustVolume(String audioType, float gain) {
FloatControl gainControl = null;
gainControl = (FloatControl) clipSFX.getControl(FloatControl.Type.MASTER_GAIN);
if(gain > MAX_VOLUME)
gain = MAX_VOLUME;
if(gain < MIN_VOLUME)
gain = MIN_VOLUME;
//set volume
gainControl.setValue(gain);
}
But when trying to translate this principle to SAMPLE_RATE, I get an error very early on at this stage:
public void adjustVolume(String audioType, float gain) {
FloatControl gainControl = null;
gainControl = (FloatControl) clipSFX.getControl(FloatControl.Type.SAMPLE_RATE);
//ERROR: Exception in thread "Thread-3" java.lang.IllegalArgumentException: Unsupported control type: Sample Rate
//I haven't gotten this far yet since the above breaks, but in theory will then set value?
gainControl.setValue(gain);
}
Everything I've found online seems to be related to taking input from a mic or some external line and doesn't seem to translate to using an audio file, so I'm unsure what I'm missing. Any help would be appreciated! Thanks!
Here we have a method that changes the speed - by doubling the sample rate. Basically the steps are as follows:
open the audio stream of the file
get the format
create a new format with the sample rate changed
open a data line with that format
read from the file/audio stream and play onto the line
The concepts here are SourceDataLine, AudioFormat and AudioInputStream. If you look at the javax.sound tutorial you will find them, or even the pages of the classes. You can now create your own method (like adjust(factor)) that just gets the new format and all else stay the same.
public void play() {
try {
File fileIn = new File(" ....);
AudioInputStream audioInputStream=AudioSystem.getAudioInputStream(fileIn);
AudioFormat formatIn=audioInputStream.getFormat();
AudioFormat format=new AudioFormat(formatIn.getSampleRate()*2, formatIn.getSampleSizeInBits(), formatIn.getChannels(), true, formatIn.isBigEndian());
System.out.println(formatIn.toString());
System.out.println(format.toString());
byte[] data=new byte[1024];
DataLine.Info dinfo=new DataLine.Info(SourceDataLine.class, format);
SourceDataLine line=(SourceDataLine)AudioSystem.getLine(dinfo);
if(line!=null) {
line.open(format);
line.start();
while(true) {
int k=audioInputStream.read(data, 0, data.length);
if(k<0) break;
line.write(data, 0, k);
}
line.stop();
line.close();
}
}
catch(Exception ex) { ex.printStackTrace(); }
}
It is also possible to vary the speed by using linear interpolation when progressing through the audio data.
Audio values are laid out in an array and the cursor normally goes from value to value. But you can set things up to progress an arbitrary amount, for example 1.5 frames, and create a weighted value where needed.
Suppose data is as follows:
0.5
0.8
0.2
-0.1
-0.5
-0.7
Your playback data (for 1.5 rate) would be
0.5
(0.8 + 0.2)/2
-0.1
(-0.5 + -0.7)/2
I know there have been posts that more fully explain this algorithm before on Stack Overflow. Forgive me for not tracking them down.
I use this method to allow real-time speed changes in .wav playback in the following open-source library: AudioCue. Feel free to check out the code and make use of the ideas in it.
Following is the method that creates a stereo pair of audio values from a spot that lies in between two audio frames (data is signed floats, ranging from -1 to 1). It's from an inner class AudioCuePlayer in AudioCue.java. Probably not the easiest to read. The sound data being read is in the array cue, and idx is the current "play head" location that is progressing through this array. 'intIndex' is the audio frame, and 'flatIndex' is the actual location of the frame in the array. I use frames to track the playhead's location and calculate the interpolation weights, and then use the flatIndex for getting the corresponding values from the array.
private float[] readFractionalFrame(float[] audioVals, float idx)
{
final int intIndex = (int) idx;
final int flatIndex = intIndex * 2;
audioVals[0] = cue[flatIndex + 2] * (idx - intIndex)
+ cue[flatIndex] * ((intIndex + 1) - idx);
audioVals[1] = cue[flatIndex + 3] * (idx - intIndex)
+ cue[flatIndex + 1] * ((intIndex + 1) - idx);
return audioVals;
}
I'd be happy to clarify if there are questions.
I am trying to extract frequency from a wav file, but looks like something is going wrong.
First of all I am extracting bytes from files, then applying FFT on it and at last finding the magnitude.
Seems like I am doing something wrong as the output is not close to real value.
Below is the code.
try{
File log = new File("files/log.txt");
if(!log.exists()) log.createNewFile();
PrintStream ps = new PrintStream(log);
File f = new File("files/5000.wav");
FileInputStream fis = new FileInputStream(f);
int length = (int)f.length();
length = (int)nearestPow2(length);
double[] ibr = new double[length]; //== real
double[] ibi = new double[length]; //== imaginary
int i = 0;
int l=0;
//fis.skip(44);
byte[] b = new byte[1024];
while((l=fis.read(b))!=-1){
try{
for(int j=0; j<1024; j++){
ibr[i] = b[j];
ibi[i] = 0;
i++;
}
}catch(Exception e){}
}
double[] ftb = FFTBase.fft(ibr, ibi, true);
double[] mag = new double[ftb.length/2];
double mxMag = 0;
long avgMg = 0;
int reqIndex = 512; //== no need to go till end
for(i=1;i<ibi.length; i++){
ibr[i] = ftb[i*2];
ibi[i] = ftb[i*2+1];
mag[i] = Math.sqrt(ibr[i]*ibr[i]+ibi[i]*ibi[i]);
avgMg += mag[i];
if(mag[i]>mxMag) mxMag = mag[i];
ps.println(mag[i]);
}
avgMg = avgMg/ibi.length;
ps.println("MAx===="+mxMag);
ps.println("Average===="+avgMg);
}catch(Exception e){e.printStackTrace();}
When I run this code for a 5KHZ file , these are the values I am getting.
https://pastebin.com/R3V0QU4G
This is not the complete output, but its somewhat similar.
Thanks
Extracting a frequency, or a "pitch" is unfortunatly hardly possible by only doing a fft and searching for the "loudest" frequency or something like that. At least if you are trying to extract it from a musical signal.
Also there are different kinds of tones. A large portion of musical instruments (i.e. a guitar or our voice) create harmonic sounds which consists of several frequencies which follow a certain pattern.
But there are also tones that have only one Peak / frequency (i.e. whistleing)
Additionally you usually have to deal with noise in the signal that is not tonal at all. This could be a background noise, or this could be produced by the instrument itself. Guitars for instance have a very large noise-portion while the attack-phase.
You can use different approaches, meaning different algorthims to find the pitch of these signals, depending of its type.
If we stay in the frequency domain (fft) and assuming we want to analyze a harmonic sound there is for example the two way mismatch algorithm that uses a statistical patternmatching to find harmonics and to guess the fundamental frequency, which is the frequency that is perceived as the tone by our ears.
An example-implementation can be found here: https://github.com/ausmauricio/audio_dsp This repo is part of a complete course on audio signal processing at coursera, maybe this is helpful.
In java you can create a SourceDataLine like so:
AudioFormat af = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100.0, 16, 1, 2, 44100.0, false);
SourceDataLine sdl = AudioSystem.getSourceDataLine(af);
After which you can open and then write data to it:
byte[] data = new byte[1024];
fillwithsounds(data);
sdl.open()
sdl.start()
sdl.write(data, 0, 1024);
This all works fine for mono data.
What I'd like to do is to be able to write stereo data, and I can't find any documentation online on how I need to change my byte array to be able to write stereo data.
It seems like I need to increase the amount of channels when I create the AudioFormat - to make it stereo - and then I need to half the framerate (otherwise Java throws an IllegalArgumentException)
I don't understand why this is though, or what the new format should be for the data that I feed to the DataSourceLine.
Perhaps somebody with a little more experience in audio formats than I could shed some light on this problem. Thanks in advance!
The format I use for stereo is as follows:
audioFmt = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
44100, 16, 2, 4, 44100, false);
You probably want to double the bytes per frame instead of halving your bits-encoding. I'm not sure what 8-bit encoding sounds like, but it is definitely going to be noisier than 16-bit encoding!
The resulting file is twice as long. You can then take the two-byte pairs that make the 16-bit sample and copy them into the next two positions, for "mono" playback (both stereo channels identical).
Given:
frame = F
little end byte = A
big end byte = B
AB = 16-bit little-endian encoding
left channel = L
right channel = R
Your original mono:
F1A, F1B, F2A, F2B, F3A, F3B ...
Stereo using the above format:
F1AL, F1BL, F1AR, F1BR, F2AL, F2BL, F2AR, F2BR, F3AL, F3BL, F3AR, F3BR ...
I could very well have the order of left and right mixed up. But I hope you get the idea!
I found out the solution just now, and found Andrew Thompson's comment to explain exactly what I needed.
I figured that I'd have to write each frame twice, what caught me up was the fact that Java wouldn't let me just use the frame size I had for my mono channel. (It threw an IllegalArgumentException)
So I halved the framerate to satisfy Java, but I didn't remember to modify the byte array.
I've implemented code that takes the "2 bytes per frame, 1 channel" byte[] and converts it into a "1 byte per frame, 2 channel" byte[].
private static final float LEFT = -1;
private static final float CENTER = 0;
private static final float RIGHT = 1;
private byte[] monoToStereo(byte[] data, float pan){
byte[] output = new byte[data.length];
for (int i = 0; i < (data.length - 2); i+=2){
int currentvalue = (data[i+1]*256 + data[i])/(256*2);
if (pan == LEFT || pan == CENTER){
output[i] = (byte) currentvalue;
} else {
output[i] = 0;
}
if (pan == RIGHT || pan == CENTER){
output[i+1] = (byte) currentvalue;
} else {
output[i+1] = 0;
}
}
return output;
}
Using this, I can get stereo audio to playback (although there is soft static, I can clearly hear the original track)
I am currently using the GraphView from the developer jjoe64 on GitGub and I was wondering how I would retrieve the double I created in my BT connected thread class to the GraphView class. This is the original function to call random data, but I want the serial data from my BlueTooth class
The current function in this realtime graph is:
private double getRandom() {
double high = 3;
double low = 0.5;
return Math.random() * (high - low) + low;
}
In my Bluetooth class, I have the command ConnectedThread.read(), but It's not really working. Here it is:
public static double read() {
try {
byte[] buffer = new byte[1024];
double bytes = mmInStream.read(buffer);
return bytes;
} catch(IOException e) {
return 5;
}
}
I am not sure if I it's just my phone that's too slow, it's running Android2.3 (DesireHD), but my professor at my school said it should work fine if I just call ConnectedThread.read() and have it equal a double. Any advice?
You haven't provided enough information for a out-of-the box solution, but I'll give it a shot anyway.
First of all, I presume that mmInStream is an InputStream or its subclass. Look at the API of int InputStream.read(byte[] b):
Reads some number of bytes from the input stream and stores them into the buffer array b. The number of bytes actually read is returned as an integer. This method blocks until input data is available, end of file is detected, or an exception is thrown.
This means that what you're returning from your read() method is just the number of bytes that have been written to the buffer from mmInStream. That is probably not what you want to do. What you probably want to do is read just the value from this stream. To do that you should:
wrap your mmInStream in a DataInputStream just after the mmInStream is created:
mmInStream = yourMethodCreatingInputStream();
dataInStream = new DataInputStream(mmInStream);
read the double value from the dataInStream. But as in all computer systems you must be aware of the exact format that your input value comes in. You must refer to the specification of the device you're using to fetch the input data.
Now the dataInStream comes in handy because it abstracts the necessary low-level IO operations and lets you focus on the data. It will automatically translate your queries for the data to the IO operations. For example:
If your data is in double format (and I believe that is the case according to the words of your professor), your read() method is as simple as:
public static double read() {
return dataInStream.readDouble();
}
And in case the data is coming in the float format:
public static double read() {
return (double)dataInStream.readFloat();
}
But again, be sure to consult the specification of the device you're using for the exact format. Some devices may pass you data in exotic formats like for example: "first 2 bytes are the integer part of the resulting value, second 2 bytes are the fractional part". It is up to you as a consumer of the data to follow its format.
I have an array of audio data I am passing to a reader:
recorder.read(audioData,0,bufferSize);
The instantiation is as follows:
AudioRecord recorder;
short[] audioData;
int bufferSize;
int samplerate = 8000;
//get the buffer size to use with this audio record
bufferSize = AudioRecord.getMinBufferSize(samplerate, AudioFormat.CHANNEL_CONFIGURATION_MONO, AudioFormat.ENCODING_PCM_16BIT)*3;
//instantiate the AudioRecorder
recorder = new AudioRecord(AudioSource.MIC,samplerate, AudioFormat.CHANNEL_CONFIGURATION_MONO, AudioFormat.ENCODING_PCM_16BIT,bufferSize);
recording = true; //variable to use start or stop recording
audioData = new short [bufferSize]; //short array that pcm data is put into.
I have a FFT class I have found online and a complex class to go with it.
I have tried for two days looking online everywhere but cant work out how to loop through the values stored in audioData and pass it to the FFT.
This is the FFT class I am using: http://www.cs.princeton.edu/introcs/97data/FFT.java
and this is the complex class to go with it: http://introcs.cs.princeton.edu/java/97data/Complex.java.html
Assuming the audioData array contains the raw audio data, you need to create a Complex[] object from the audioData array as such:
Complex[] complexData = new Complex[audioData.length];
for (int i = 0; i < complexData.length; i++) {
complextData[i] = new Complex(audioData[i], 0);
}
Now you can pass your complexData object as a parameter to your FFT function:
Complex[] fftResult = FFT.fft(complexData);
Some of the details will depend on the purpose of your FFT.
The length of the FFT required depends on the frequency resolution and time accuracy (which are inversely related), that you wish in your analysis, which may or may not be anywhere near the length of an audio input buffer. Given those differences in length, you may have to combine multiple buffers, segment a single buffer, or some combination of the two, to get the FFT window length that meets your analysis requirements.
PCM is the technique of encoding data. It's not relevant to getting frequency analysis of audio data using FFT. If you use Java to decode PCM encoded data you will get raw audio data which can then be passed to your FFT library.