Java - Adjust playback speed of a WAV file - java

I'm likely dense but I cannot seem to find a solution to my issue
(NOTE: I CAN find lots of people reporting this issue, seems like it happened as a result of newer Java (possible 1.5?). Perhaps SAMPLE_RATE is no longer supported? I am unable to find any solution).
I'm trying to adjust the SAMPLE_RATE to speed up/slow down song. I can successfully play a .wav file without issue, so I looked into FloatControl which worked for adjusting volume:
public void adjustVolume(String audioType, float gain) {
FloatControl gainControl = null;
gainControl = (FloatControl) clipSFX.getControl(FloatControl.Type.MASTER_GAIN);
if(gain > MAX_VOLUME)
gain = MAX_VOLUME;
if(gain < MIN_VOLUME)
gain = MIN_VOLUME;
//set volume
gainControl.setValue(gain);
}
But when trying to translate this principle to SAMPLE_RATE, I get an error very early on at this stage:
public void adjustVolume(String audioType, float gain) {
FloatControl gainControl = null;
gainControl = (FloatControl) clipSFX.getControl(FloatControl.Type.SAMPLE_RATE);
//ERROR: Exception in thread "Thread-3" java.lang.IllegalArgumentException: Unsupported control type: Sample Rate
//I haven't gotten this far yet since the above breaks, but in theory will then set value?
gainControl.setValue(gain);
}
Everything I've found online seems to be related to taking input from a mic or some external line and doesn't seem to translate to using an audio file, so I'm unsure what I'm missing. Any help would be appreciated! Thanks!

Here we have a method that changes the speed - by doubling the sample rate. Basically the steps are as follows:
open the audio stream of the file
get the format
create a new format with the sample rate changed
open a data line with that format
read from the file/audio stream and play onto the line
The concepts here are SourceDataLine, AudioFormat and AudioInputStream. If you look at the javax.sound tutorial you will find them, or even the pages of the classes. You can now create your own method (like adjust(factor)) that just gets the new format and all else stay the same.
public void play() {
try {
File fileIn = new File(" ....);
AudioInputStream audioInputStream=AudioSystem.getAudioInputStream(fileIn);
AudioFormat formatIn=audioInputStream.getFormat();
AudioFormat format=new AudioFormat(formatIn.getSampleRate()*2, formatIn.getSampleSizeInBits(), formatIn.getChannels(), true, formatIn.isBigEndian());
System.out.println(formatIn.toString());
System.out.println(format.toString());
byte[] data=new byte[1024];
DataLine.Info dinfo=new DataLine.Info(SourceDataLine.class, format);
SourceDataLine line=(SourceDataLine)AudioSystem.getLine(dinfo);
if(line!=null) {
line.open(format);
line.start();
while(true) {
int k=audioInputStream.read(data, 0, data.length);
if(k<0) break;
line.write(data, 0, k);
}
line.stop();
line.close();
}
}
catch(Exception ex) { ex.printStackTrace(); }
}

It is also possible to vary the speed by using linear interpolation when progressing through the audio data.
Audio values are laid out in an array and the cursor normally goes from value to value. But you can set things up to progress an arbitrary amount, for example 1.5 frames, and create a weighted value where needed.
Suppose data is as follows:
0.5
0.8
0.2
-0.1
-0.5
-0.7
Your playback data (for 1.5 rate) would be
0.5
(0.8 + 0.2)/2
-0.1
(-0.5 + -0.7)/2
I know there have been posts that more fully explain this algorithm before on Stack Overflow. Forgive me for not tracking them down.
I use this method to allow real-time speed changes in .wav playback in the following open-source library: AudioCue. Feel free to check out the code and make use of the ideas in it.
Following is the method that creates a stereo pair of audio values from a spot that lies in between two audio frames (data is signed floats, ranging from -1 to 1). It's from an inner class AudioCuePlayer in AudioCue.java. Probably not the easiest to read. The sound data being read is in the array cue, and idx is the current "play head" location that is progressing through this array. 'intIndex' is the audio frame, and 'flatIndex' is the actual location of the frame in the array. I use frames to track the playhead's location and calculate the interpolation weights, and then use the flatIndex for getting the corresponding values from the array.
private float[] readFractionalFrame(float[] audioVals, float idx)
{
final int intIndex = (int) idx;
final int flatIndex = intIndex * 2;
audioVals[0] = cue[flatIndex + 2] * (idx - intIndex)
+ cue[flatIndex] * ((intIndex + 1) - idx);
audioVals[1] = cue[flatIndex + 3] * (idx - intIndex)
+ cue[flatIndex + 1] * ((intIndex + 1) - idx);
return audioVals;
}
I'd be happy to clarify if there are questions.

Related

What is MediaCodec, MediaExtractor and MediaMuxer in android?

What does MediaCodec, MediaExtractor and MediaMuxer mean in android? I am not a video person but I do know what encoding and decoding means, at a basic level. I need to know what are the functions of each classes and at which use cases are they used. I would also like to know:
If I want to extract frames from a camera preview and create a video file along with some editing (like speed), which classes should I use and how does it work together?
If I want to create a Video Player like Exoplayer (not all the functions but a simple Dash adaptive streaming player) , which classes should I use and how does it work together?
Hope you will answer. Thank You.
Let me start of by saying that it is hard to understand this API`s if you don't understand how video encoding/decoding works. I would suggest doing research about how encoders/decoders work.
I will provide an oversimplified explanation of each.
MediaCodec:
MediaCodec class can be used to access low-level media codecs, i.e. encoder/decoder components. It is part of the Android low-level multimedia support infrastructure
So MediaCodec handles the decoding or encoding of the video packets/buffers and is responsible for the interaction with the codec.
Here is an example of how to Initialize MediaCodec:
// Create Mediacodec instance by passing a mime type. It will select the best codec for this mime type
MediaCodec mDecoder = MediaCodec.createDecoderByType(mimeType);
// Pass an instance on MediaFormat and the output/rendering Surface
mDecoder.configure(format, surface, null, 0);
mDecoder.start();
You would then start passing buffers to MediaCodec, like this:
ByteBuffer[] inputBuffers = mDecoder.getInputBuffers();
int index = mDecoder.dequeueInputBuffer(timeout);
// Check if buffers are available
if (index >= 0) {
// Get dequeued buffer
ByteBuffer buffer = inputBuffers[index];
// Get sample data size to determine if we should keep queuing more buffers or signal end of stream
int sampleSize = mExtractor.readSampleData(buffer, 0);
if (sampleSize < 0) {
// Signal EOS, this happens when you reach the end if the video, for example.
mDecoder.queueInputBuffer(inIndex, 0, 0, 0, MediaCodec.BUFFER_FLAG_END_OF_STREAM);
} else {
// Queue the dequeued buffer and pass the extractors sample time
mDecoder.queueInputBuffer(index, 0, sampleSize, mExtractor.getSampleTime(), 0);
mExtractor.advance();
}
}
You then dequeue the output buffer and release it to your surface:
BufferInfo frameInfo = new BufferInfo();
int index mDecoder.dequeueOutputBuffer(frameInfo, timeout);
switch (index) {
case
MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED:
break;
case
MediaCodec.INFO_OUTPUT_FORMAT_CHANGED:
MediaFormat newFormat = mDecoder.getOutputFormat();
break;
case
MediaCodec.INFO_TRY_AGAIN_LATER: break; default:
break;
}
// You can now push the frames to the surface
// This is where you can control the playback speed, you can do this by letting your thread sleep momentarily
if (index > 0) {
mDecoder.releaseOutputBuffer(bufferIndex, true);
}
MediaExtractor:
MediaExtractor facilitates extraction of demuxed, typically encoded, media data from a data source.
The documentation description is self-explanatory.
Have a look below, I've added comments to make it more understandable:
// Initialize the extractor
MediaExtractor() mExtract = new MediaExtractor(); mExtract.setDataSource(mSource);
// Select/set the video track (if available)
int trackIndex = selectVideoTrack(mExtract);
if(trackIndex < 0)
throw new IOException("Can't find track");
mExtract.selectTrack(trackIndex);
// The extractor is now ready to be used
// Get the track format
mFormat = mExtractor.getTrackFormat(trackIndex);
// Get buffer size to check if a buffer is available
// This will be used by MediaCodec to determine if buffers are available
sampleSize = mExtractor.readSampleData(buffer, 0);
MediaMuxer:
MediaMuxer facilitates muxing elementary streams. Currently MediaMuxer supports MP4, Webm and 3GP file as the output. It also supports muxing B-frames in MP4 since Android Nougat.
This is self-explanatory once again. It's used to create a video/audio file. For example, merging two video files together.

How to Merge MFCCs

I am working on extracting MFCC features from some audio files. The program I have currently extracts a series of MFCCs for each file and has a parameter of a buffer size of 1024. I saw the following in a paper:
The feature vectors extracted within a second of audio data are combined by computing the mean and the variance of each feature vector element (merging).
My current code uses TarsosDSP to extract the MFCCs, but I'm not sure how to split the data into "a second of audio data" in order to merge the MFCCs.
My MFCC extraction code
int sampleRate = 44100;
int bufferSize = 1024;
int bufferOverlap = 512;
inStream = new FileInputStream(path);
AudioDispatcher dispatcher = new AudioDispatcher(new UniversalAudioInputStream(inStream, new TarsosDSPAudioFormat(sampleRate, 16, 1, true, true)), bufferSize, bufferOverlap);
final MFCC mfcc = new MFCC(bufferSize, sampleRate, 13, 40, 300, 3000);
dispatcher.addAudioProcessor(mfcc);
dispatcher.addAudioProcessor(new AudioProcessor() {
#Override
public void processingFinished() {
System.out.println("DONE");
}
#Override
public boolean process(AudioEvent audioEvent) {
return true; // breakpoint here reveals MFCC data
}
});
dispatcher.run();
What exactly is buffer size and could it be used to segment the audio into windows of 1 second? Is there a method to divide the series of MFCCs into certain amounts of time?
Any help would be greatly appreciated.
After more research, I came across this website that clearly showed steps in using MFCCs for Weka. It showed some data files with various statistics each listed as separate attributes in Weka. I believe when the paper said
computing the mean and variance
they meant the mean and variance of each MFCC coefficient were used as attributes in the combined data file. When I followed the example on the website to merge the MFCCs, I used max, min, range, max position, min position, mean, standard deviation, skewness, kurtosis, quartile, and interquartile range.
To split the audio input into seconds, I believe sets are of MFCCs are extracted at the sample rate inputted as the parameter, so if I set it to 100, I would wait for 100 cycles to merge the MFCCs. Please correct me if I'm wrong.

Why always got NullPointerException while getting KEY_FRAME_RATE from MediaCodec?

I want to retrieve the frameRate from the MediaCodec, but i always got the NullPointerException. The code as following:
public void handleWriteSampleData(MediaCodec encoder, int trackIndex, int bufferIndex, ByteBuffer encodedData, MediaCodec.BufferInfo bufferInfo) {
super.writeSampleData(encoder, trackIndex, bufferIndex, encodedData, bufferInfo);
int rc = -1;
if (((bufferInfo.flags & MediaCodec.BUFFER_FLAG_CODEC_CONFIG) != 0)) {
if (VERBOSE) Log.i(TAG, "handling BUFFER_FLAG_CODEC_CONFIG for track " + trackIndex);
if (trackIndex == VIDEO_TRACK_INDEX) {
// Capture H.264 SPS + PPS Data
Log.d(TAG, "Capture SPS + PPS");
captureH264MetaData(encodedData, bufferInfo);
mFFmpeg.setVideoCodecExtraData(videoConfig, videoConfig.length);
fps = encoder.getOutputFormat().getInteger(MediaFormat.KEY_FRAME_RATE);
Log.i(TAG, "fps:" + fps);
}
....
The exception as following:
E/AndroidRuntime( 1419): java.lang.NullPointerException
E/AndroidRuntime( 1419): at android.media.MediaFormat.getInteger(MediaFormat.java:282)
I have gone through the source code:
/**
* Returns the value of an integer key.
*/
public final int getInteger(String name) {
return ((Integer)mMap.get(name)).intValue();
}
How to get the right fps value from MediaCodec ?
The information is not available from MediaCodec, because timing information is not necessarily present in H.264 (see e.g. this link). KEY_FRAME_RATE is an argument for the encoder.
A nominal frame rate value may be present in the wrapper (e.g. .mp4) that MediaExtractor handles, but I don't know if there's a consistent way to access that.
You can dig through SPS, or calculate an fps value by looking at the presentation time stamps on a series of frames. If the video uses a variable frame rate, like "screenrecord" output does, then you may get an incorrect number from this... but I would argue that, for VFR video, there is no correct number for The Frame Rate, just maximum and average values.
You need to wait for the MediaCodec.INFO_OUTPUT_FORMAT_CHANGED flag if you want to read the MediaFormat from the Codec.
int bufferIndex = mMediaCodec.dequeueOutputBuffer(this.mBufferInfo, QUEUEING_TIMEOUT);
if (bufferIndex == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
MediaFormat mMediaFormat = mMediaCodec.getOutputFormat();
}

What is the format of the data you can write to a SourceDataLine?

In java you can create a SourceDataLine like so:
AudioFormat af = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100.0, 16, 1, 2, 44100.0, false);
SourceDataLine sdl = AudioSystem.getSourceDataLine(af);
After which you can open and then write data to it:
byte[] data = new byte[1024];
fillwithsounds(data);
sdl.open()
sdl.start()
sdl.write(data, 0, 1024);
This all works fine for mono data.
What I'd like to do is to be able to write stereo data, and I can't find any documentation online on how I need to change my byte array to be able to write stereo data.
It seems like I need to increase the amount of channels when I create the AudioFormat - to make it stereo - and then I need to half the framerate (otherwise Java throws an IllegalArgumentException)
I don't understand why this is though, or what the new format should be for the data that I feed to the DataSourceLine.
Perhaps somebody with a little more experience in audio formats than I could shed some light on this problem. Thanks in advance!
The format I use for stereo is as follows:
audioFmt = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
44100, 16, 2, 4, 44100, false);
You probably want to double the bytes per frame instead of halving your bits-encoding. I'm not sure what 8-bit encoding sounds like, but it is definitely going to be noisier than 16-bit encoding!
The resulting file is twice as long. You can then take the two-byte pairs that make the 16-bit sample and copy them into the next two positions, for "mono" playback (both stereo channels identical).
Given:
frame = F
little end byte = A
big end byte = B
AB = 16-bit little-endian encoding
left channel = L
right channel = R
Your original mono:
F1A, F1B, F2A, F2B, F3A, F3B ...
Stereo using the above format:
F1AL, F1BL, F1AR, F1BR, F2AL, F2BL, F2AR, F2BR, F3AL, F3BL, F3AR, F3BR ...
I could very well have the order of left and right mixed up. But I hope you get the idea!
I found out the solution just now, and found Andrew Thompson's comment to explain exactly what I needed.
I figured that I'd have to write each frame twice, what caught me up was the fact that Java wouldn't let me just use the frame size I had for my mono channel. (It threw an IllegalArgumentException)
So I halved the framerate to satisfy Java, but I didn't remember to modify the byte array.
I've implemented code that takes the "2 bytes per frame, 1 channel" byte[] and converts it into a "1 byte per frame, 2 channel" byte[].
private static final float LEFT = -1;
private static final float CENTER = 0;
private static final float RIGHT = 1;
private byte[] monoToStereo(byte[] data, float pan){
byte[] output = new byte[data.length];
for (int i = 0; i < (data.length - 2); i+=2){
int currentvalue = (data[i+1]*256 + data[i])/(256*2);
if (pan == LEFT || pan == CENTER){
output[i] = (byte) currentvalue;
} else {
output[i] = 0;
}
if (pan == RIGHT || pan == CENTER){
output[i+1] = (byte) currentvalue;
} else {
output[i+1] = 0;
}
}
return output;
}
Using this, I can get stereo audio to playback (although there is soft static, I can clearly hear the original track)

Android library to get pitch from WAV file

I have a list of sampled data from the WAV file. I would like to pass in these values into a library and get the frequency of the music played in the WAV file. For now, I will have 1 frequency in the WAV file and I would like to find a library that is compatible with Android. I understand that I need to use FFT to get the frequency domain. Is there any good libraries for that? I found that [KissFFT][1] is quite popular but I am not very sure how compatible it is on Android. Is there an easier and good library that can perform the task I want?
EDIT:
I tried to use JTransforms to get the FFT of the WAV file but always failed at getting the correct frequency of the file. Currently, the WAV file contains sine curve of 440Hz, music note A4. However, I got the result as 441. Then I tried to get the frequency of G4, I got the result as 882Hz which is incorrect. The frequency of G4 is supposed to be 783Hz. Could it be due to not enough samples? If yes, how much samples should I take?
//DFT
DoubleFFT_1D fft = new DoubleFFT_1D(numOfFrames);
double max_fftval = -1;
int max_i = -1;
double[] fftData = new double[numOfFrames * 2];
for (int i = 0; i < numOfFrames; i++) {
// copying audio data to the fft data buffer, imaginary part is 0
fftData[2 * i] = buffer[i];
fftData[2 * i + 1] = 0;
}
fft.complexForward(fftData);
for (int i = 0; i < fftData.length; i += 2) {
// complex numbers -> vectors, so we compute the length of the vector, which is sqrt(realpart^2+imaginarypart^2)
double vlen = Math.sqrt((fftData[i] * fftData[i]) + (fftData[i + 1] * fftData[i + 1]));
//fd.append(Double.toString(vlen));
// fd.append(",");
if (max_fftval < vlen) {
// if this length is bigger than our stored biggest length
max_fftval = vlen;
max_i = i;
}
}
//double dominantFreq = ((double)max_i / fftData.length) * sampleRate;
double dominantFreq = (max_i/2.0) * sampleRate / numOfFrames;
fd.append(Double.toString(dominantFreq));
Can someone help me out?
EDIT2: I manage to fix the problem mentioned above by increasing the number of samples to 100000, however, sometimes I am getting the overtones as the frequency. Any idea how to fix it? Should I use Harmonic Product Frequency or Autocorrelation algorithms?
I realised my mistake. If I take more samples, the accuracy will increase. However, this method is still not complete as I still have some problems in obtaining accurate results for piano/voice sounds.

Categories