I'm developing a sound recognition system. I'm using a tensorflow model developed on python to convert MFCC values to labels. I'm using the MediaRecorder class to record the audio, and I'm doing it in a loop so I can be constantly getting microphone audio and then getting the label from the model. Here is the recording loop:
temp = 0;
while (true) {
audioPath = getApplicationContext().getFilesDir().getAbsolutePath();
audioPath += "/Recording" + temp + ".3gp";
audioFile = new File(audioPath);
mediaRecorder = new MediaRecorder();
mediaRecorder.setAudioSource(MediaRecorder.AudioSource.MIC);
mediaRecorder.setOutputFormat(MediaRecorder.OutputFormat.THREE_GPP);
mediaRecorder.setAudioEncoder(MediaRecorder.AudioEncoder.AMR_NB);
mediaRecorder.setOutputFile(audioPath);
try {
mediaRecorder.prepare();
} catch (IOException e) {
e.printStackTrace();
}
mediaRecorder.start();
sleep(2000);
if (!isRunning) {
mediaRecorder.stop();
return;
}
try {
int amplitude = mediaRecorder.getMaxAmplitude();
Log.d("volume", Integer.toString(amplitude));
//finished = false;
avgVolumeTask task = new avgVolumeTask();
task.execute(amplitude);
} catch (Exception e) {
Log.d("Exception in startMediaRecorder()", e.toString());
}
mediaRecorder.stop();
mediaRecorder.release();
soundRecognition task2 = new soundRecognition();
task2.execute();
audioFile.delete();
temp++;
}
This is the soundRecognition method:
private class soundRecognition extends AsyncTask<Integer, Integer, Long> {
#Override
protected Long doInBackground(Integer... level) {
float[] mfccValues = null;
Interpreter tflite = null;
float[][] labelProbArray = null;
try {
mfccValues = computeMFCC();
labelList = loadLabelList();
labelProbArray = new float[1][labelList.size()];
tflite = new Interpreter(loadModel());
} catch (IOException e) {
e.printStackTrace();
} catch (UnsupportedAudioFileException e) {
e.printStackTrace();
}
tflite.run(mfccValues, labelProbArray);
for (int i = 0; i < labelProbArray[0].length; i++) {
float value = labelProbArray[0][i];
//if (i == 1f){
//Log.d("Output at " + Integer.toString(i) + ": ", Float.toString(value));
//doAlert(i);
//}
}
return null;
}
}
The computeMFCC method is this:
public float[] computeMFCC() throws IOException, UnsupportedAudioFileException {
FileInputStream in2 = new FileInputStream(audioPath);
int i;
// InputStream to byte array
byte[] buf = IOUtils.toByteArray(in2);
in2.close();
i = Integer.MAX_VALUE;
// byte array to short array
short[] shortArr = new short[buf.length / 2];
ByteBuffer.wrap(buf).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(shortArr);
int count = 0;
while (count <= shortArr.length) { // Still have data to process.
for (int n = 0; n < nSubframePerBuf; n++) { // Process audio signal in ArrayList and shift by one subframe each time
int k = 0;
for (i = (n * frameShift); i < (n + 1) * frameShift; i++) {
subx[k] = shortArr[i];
k++;
}
subframeList.add(subx); // Add the current subframe to the subframe list. Later, a number of
}
count++;
}
// Need at least nSubframePerMfccFrame to get one analysis frame
x = extractOneFrameFromList(nSubframePerMfccFrame);
MFCC mfcc = new MFCC(samplePerFrm, 16000, numMfcc);
double[] mfccVals = mfcc.doMFCC(x);
float[] floatArray = new float[mfccVals.length];
for (i = 0 ; i < mfccVals.length; i++)
{
floatArray[i] = (float) mfccVals[i];
}
return floatArray;
}
And the doMFCC method is from a downloaded java file here:
https://github.com/enmwmak/ScreamDetector/blob/master/src/edu/polyu/mfcc/MFCC.java
The issue I'm having is that after a few iterations, I run into the problem that the file doesnt get created, and then get a null error passing the results from the input stream to the tensorflow model.
Possible Issues
One reason could be where the file is stored. I've been trying to send the file to local storage because I was worried that all the devices wouldnt have external storage.
Another reason could be that i'm not calling the sound recognition in the right spot. I waited will after the mediaRecorder is stopped to make sure that the file is written with the mic audio, but when I review the contents of the fileInputStream, it appears to not be working, and in each loop the file is always the same.
Any help would be much appreciated.
It may be tricky to have a sleep(2000) inside while loop.
It may be better to check millis and break until 2000 ms has lapsed.
Related
I have a program that will ask the user which songs they want to play out of a list of available songs and after the user selects one once the song finishes it asks the user which song they want to play again. I have been told to use line listener for this but I can't seem to figure out how to even after using the oracle docs
my code
public class Main {
public static void main(String[] args) {
Scanner input = new Scanner(System.in);
String[] pathnames;
File MusicFileChosen;
String musicDir;
boolean songComplete = false;
pathnames = ProgramMap.musicDir.list();
// Print the names of files and directories
for (int ListNum = 0; ListNum < pathnames.length; ListNum++) {
System.out.println(ListNum + 1 + ". " + pathnames[ListNum]);
}
for (int playlistLength = 0; playlistLength < pathnames.length; playlistLength++){
if (!songComplete) {
System.out.println("Which Song would you like to play?");
int musicChoice = input.nextInt();
musicDir = ProgramMap.userDir + "\\src\\Music\\" + pathnames[musicChoice - 1];
MusicFileChosen = new File(musicDir);
PlaySound(MusicFileChosen, pathnames[musicChoice - 1]);
}
}
}
public static void PlaySound(File sound, String FileName){
try{
// Inits the Audio System
Clip clip = AudioSystem.getClip();
AudioInputStream AudioInput = AudioSystem.getAudioInputStream(sound);
//Finds and accesses the clip
clip.open(AudioInput);
//Starts the clip
clip.start();
System.out.println("Now Playing " + FileName);
clip.drain();
}catch (Exception e){
System.out.println("Error playing music");
}
}
}
Basically one thing which you need to change is to replace this:
for (int playlistLength = 0; playlistLength < pathnames.length; playlistLength++){
to something like:
while (true) {
System.out.println("Which Song would you like to play?");
int musicChoice = input.nextInt();
musicDir = ProgramMap.userDir + "\\src\\Music\\" + pathnames[musicChoice - 1];
MusicFileChosen = new File(musicDir);
PlaySound(MusicFileChosen, pathnames[musicChoice - 1]);
}
You can add some logic to break the loop.
Also, I would recommend changing a little bit PlaySound method:
public static void PlaySound(File sound, String FileName) {
try (final AudioInputStream in = getAudioInputStream(sound)) {
final AudioFormat outFormat = getOutFormat(in.getFormat());
Info info = new Info(SourceDataLine.class, outFormat);
try (final SourceDataLine line =
(SourceDataLine) AudioSystem.getLine(info)) {
if (line != null) {
line.open(outFormat);
line.start();
System.out.println("Now Playing " + FileName);
stream(getAudioInputStream(outFormat, in), line);
line.drain();
line.stop();
}
}
} catch (UnsupportedAudioFileException
| LineUnavailableException
| IOException e) {
System.err.println("Error playing music\n" + e.getMessage());
}
}
private static AudioFormat getOutFormat(AudioFormat inFormat) {
final int ch = inFormat.getChannels();
final float rate = inFormat.getSampleRate();
return new AudioFormat(PCM_SIGNED, rate, 16, ch, ch * 2, rate, false);
}
private static void stream(AudioInputStream in, SourceDataLine line)
throws IOException {
final byte[] buffer = new byte[4096];
for (int n = 0; n != -1; n = in.read(buffer, 0, buffer.length)) {
line.write(buffer, 0, n);
}
}
It needs to play MP3 because you can face such a problem:
Unknown frame size.
To add MP3 reading support to Java Sound, add the mp3plugin.jar of the JMF to the run-time classpath of the application. https://www.oracle.com/technetwork/java/javase/download-137625.html
I'm trying to save a double[] array as .WAV file using this method:
public static void saveWav(String filename, double[] samples) {
// assumes 44,100 samples per second
// use 16-bit audio, 2 channels, signed PCM, little Endian
AudioFormat format = new AudioFormat(SAMPLE_RATE * 2, 16, 1, true, false);
byte[] data = new byte[2 * samples.length];
for (int i = 0; i < samples.length; i++) {
int temp = (short) (samples[i] * MAX_16_BIT);
data[2*i + 0] = (byte) temp;
data[2*i + 1] = (byte) (temp >> 8);
}
// now save the file
try {
ByteArrayInputStream bais = new ByteArrayInputStream(data);
AudioInputStream ais = new AudioInputStream(bais, format, samples.length);
if (filename.endsWith(".wav") || filename.endsWith(".WAV")) {
AudioSystem.write(ais, AudioFileFormat.Type.WAVE, new File(filename));
}
else if (filename.endsWith(".au") || filename.endsWith(".AU")) {
AudioSystem.write(ais, AudioFileFormat.Type.AU, new File(filename));
}
else {
throw new RuntimeException("File format not supported: " + filename);
}
}
catch (IOException e) {
System.out.println(e);
System.exit(1);
}
}
but when I reload the files I saved, for every song[i], the double value is different than the original. I use this method to read WAV files:
public static double[] read(String filename) {
byte[] data = readByte(filename);
int N = data.length;
double[] d = new double[N/2];
for (int i = 0; i < N/2; i++) {
d[i] = ((short) (((data[2*i+1] & 0xFF) << 8) + (data[2*i] & 0xFF))) / ((double) MAX_16_BIT);
}
return d;
}
private static byte[] readByte(String filename) {
byte[] data = null;
AudioInputStream ais = null;
try {
// try to read from file
File file = new File(filename);
if (file.exists()) {
ais = AudioSystem.getAudioInputStream(file);
data = new byte[ais.available()];
ais.read(data);
}
// try to read from URL
else {
URL url = StdAudio.class.getResource(filename);
ais = AudioSystem.getAudioInputStream(url);
data = new byte[ais.available()];
ais.read(data);
}
}
catch (IOException e) {
System.out.println(e.getMessage());
throw new RuntimeException("Could not read " + filename);
}
catch (UnsupportedAudioFileException e) {
System.out.println(e.getMessage());
throw new RuntimeException(filename + " in unsupported audio format");
}
return data;
}
I need both double[] arrays to have the exact same values, and that's not the case.
when I hear the song playing I can't tell the difference, but I still need those original values.
Any help appreciated.
Guy
A double requires 64-bits of storage and has a lot of precision. You can't just throw away 48 bits of data in the round trip and expect to get the exact same value back. It is analogous to starting with a high resolution image, converting it to a thumbnail and then expecting that you can magically recover the original high resolution image. In the real world, the human ear is not going to be able to distinguish between the two. The higher resolution is useful during computation and signal processing routines to reduce the accumulation of computational errors. That being said, if you want to store 64-bit you'll need to use something other than .WAV. The closest you'll get is 32-bit.
I have been searching for this but none seems to answer my question.
I have been trying to graph/plot a wav file by this:
int result = 0;
try {
result = audioInputStream.read(bytes);
} catch (Exception e) {
e.printStackTrace();
}
and then using the result to be a variable for a graph. I've been thinking if it is correct to change first the result to decibels. Also, am I right to use the result as a variable to be use in the graph? Or is there any way that has to be use in graphing the wav file?
The first thing you need to do is read the samples of the file, this will give you the min/max ranges of the waveform (sound wave)...
File file = new File("...");
AudioInputStream ais = null;
try {
ais = AudioSystem.getAudioInputStream(file);
int frameLength = (int) ais.getFrameLength();
int frameSize = (int) ais.getFormat().getFrameSize();
byte[] eightBitByteArray = new byte[frameLength * frameSize];
int result = ais.read(eightBitByteArray);
int channels = ais.getFormat().getChannels();
int[][] samples = new int[channels][frameLength];
int sampleIndex = 0;
for (int t = 0; t < eightBitByteArray.length;) {
for (int channel = 0; channel < channels; channel++) {
int low = (int) eightBitByteArray[t];
t++;
int high = (int) eightBitByteArray[t];
t++;
int sample = getSixteenBitSample(high, low);
samples[channel][sampleIndex] = sample;
}
sampleIndex++;
}
} catch (Exception exp) {
exp.printStackTrace();
} finally {
try {
ais.close();
} catch (Exception e) {
}
}
//...
protected int getSixteenBitSample(int high, int low) {
return (high << 8) + (low & 0x00ff);
}
Then you would need to determine the min/max values, the next example simply checks for channel 0, but you could use the same concept to check all the available channels...
int min = 0;
int max = 0;
for (int sample : samples[0]) {
max = Math.max(max, sample);
min = Math.min(min, sample);
}
FYI: It would be more efficient to populate this information when you read the file
Once you have this, you can model the samples...but that would depend on framework you intend to use...
I am trying to playback audio and keep it continuous and free from skips or blank spots. I have to first receive as bytes in chunks and convert this to mp3 to be streamed by the servletOutputStream. I only start playing once enough bytes have been collected by the consumer in an attempt to maintain a constant flow of audio. As you can see I have hard coded this buffer but would like it to work for any size of audio bytes. I was wondering if anyone had come across a similar problem and had any advice?
Thanks in advance. Any help would be greatly appreciated.
public class Consumer extends Thread {
private MonitorClass consBuf;
private InputStream mp3InputStream = null;
private OutputStream OutputStream = null;
public Consumer (MonitorClass buf, OutputStream servlet)
{
consBuf = buf;
OutputStream = servlet;
}
public void run()
{
byte[] data;
byte[] tempbuf;
int byteSize = 60720; //This should be dynamic
int byteIncrement = byteSize;
int dataPlayed = 0;
int start = 0;
int buffer = 0;
boolean delay = true;
AudioFormat generatedTTSAudioFormat = getGeneratedAudioFormat();
try
{
while(true)
{
try
{
data = consBuf.get(); //gets data from producer using a shared monitor class
if(data.length >= byteSize) //Buffer size hit, start playing
{
if(delay) //help with buffering
{
System.out.println("Pre-delay...");
consBuf.preDelay();
delay = false;
}
tempbuf = new byte[byteIncrement];
arraySwap(data, tempbuf, start, byteSize);
System.out.println("Section to play: " + start + ", " + byteSize);
mp3InputStream = FishUtils.convertToMP3( new ByteArrayInputStream(tempbuf), generatedTTSAudioFormat);
copyStream(mp3InputStream, OutputStream);
System.out.println("Data played: " + byteSize);
System.out.println("Data collected: " + consBuf.getDownloadedBytes() );
dataPlayed = byteSize;
start = byteSize;
byteSize += byteIncrement;
}
if( consBuf.getIsComplete() )
{
if (consBuf.checkAllPlayed(dataPlayed) > 0)
{
System.out.println("Producer finished, play remaining section...");
//mp3InputStream = convertToMP3(new ByteArrayInputStream(tempbuf), generatedTTSAudioFormat);
//copyStream(mp3InputStream, OutputStream);
}
System.out.println("Complete!");
break;
}
}
catch (Exception e)
{
System.out.println(e);
return;
}
}
}
finally
{
if (null != mp3InputStream)
{
try
{
mp3InputStream.skip(Long.MAX_VALUE);
}
catch (Exception e)
{}
}
closeStream(mp3InputStream);
closeStream(OutputStream);
}
}
}
I am attempting to mix several different audio streams and trying to get them to play at the same time instead of one-at-a-time.
The code below plays them one-at-a-time and I cannot figure out a solution that does not use the Java Mixer API. Unfortunately, my audio card does not support synchronization using the Mixer API and I am forced to figure out a way to do it through code.
Please advise.
/////CODE IS BELOW////
class MixerProgram {
public static AudioFormat monoFormat;
private JFileChooser fileChooser = new JFileChooser();
private static File[] files;
private int trackCount;
private FileInputStream[] fileStreams = new FileInputStream[trackCount];
public static AudioInputStream[] audioInputStream;
private Thread trackThread[] = new Thread[trackCount];
private static DataLine.Info sourceDataLineInfo = null;
private static SourceDataLine[] sourceLine;
public MixerProgram(String[] s)
{
trackCount = s.length;
sourceLine = new SourceDataLine[trackCount];
audioInputStream = new AudioInputStream[trackCount];
files = new File[s.length];
}
public static void getFiles(String[] s)
{
files = new File[s.length];
for(int i=0; i<s.length;i++)
{
File f = new File(s[i]);
if (!f.exists())
System.err.println("Wave file not found: " + filename);
files[i] = f;
}
}
public static void loadAudioFiles(String[] s)
{
AudioInputStream in = null;
audioInputStream = new AudioInputStream[s.length];
sourceLine = new SourceDataLine[s.length];
for(int i=0;i<s.length;i++){
try
{
in = AudioSystem.getAudioInputStream(files[i]);
}
catch(Exception e)
{
System.err.println("Failed to assign audioInputStream");
}
monoFormat = in.getFormat();
AudioFormat decodedFormat = new AudioFormat(
AudioFormat.Encoding.PCM_SIGNED,
monoFormat.getSampleRate(), 16, monoFormat.getChannels(),
monoFormat.getChannels() * 2, monoFormat.getSampleRate(),
false);
monoFormat = decodedFormat; //give back name
audioInputStream[i] = AudioSystem.getAudioInputStream(decodedFormat, in);
sourceDataLineInfo = new DataLine.Info(SourceDataLine.class, monoFormat);
try
{
sourceLine[i] = (SourceDataLine) AudioSystem.getLine(sourceDataLineInfo);
sourceLine[i].open(monoFormat);
}
catch(LineUnavailableException e)
{
System.err.println("Failed to get SourceDataLine" + e);
}
}
}
public static void playAudioMix(String[] s)
{
final int tracks = s.length;
System.out.println(tracks);
Runnable playAudioMixRunner = new Runnable()
{
int bufferSize = (int) monoFormat.getSampleRate() * monoFormat.getFrameSize();
byte[] buffer = new byte[bufferSize];
public void run()
{
if(tracks==0)
return;
for(int i = 0; i < tracks; i++)
{
sourceLine[i].start();
}
int bytesRead = 0;
while(bytesRead != -1)
{
for(int i = 0; i < tracks; i++)
{
try
{
bytesRead = audioInputStream[i].read(buffer, 0, buffer.length);
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
if(bytesRead >= 0)
{
int bytesWritten = sourceLine[i].write(buffer, 0, bytesRead);
System.out.println(bytesWritten);
}
}
}
}
};
Thread playThread = new Thread(playAudioMixRunner);
playThread.start();
}
}
The problem is that you are not adding the samples together. If we are looking at 4 tracks, 16-bit PCM data, you need to add all the different values together to "mix" them into one final output. So, from a purely-numbers point-of-view, it would look like this:
[Track1] 320 -16 2000 200 400
[Track2] 16 8 123 -87 91
[Track3] -16 -34 -356 1200 805
[Track4] 1011 1230 -1230 -100 19
[Final!] 1331 1188 537 1213 1315
In your above code, you should only be writing a single byte array. That byte array is the final mix of all tracks added together. The problem is that you are writing a byte array for each different track (so there is no mixdown happening, as you observed).
If you want to guarantee you don't have any "clipping", you should take the average of all tracks (so add all four tracks above and divide by 4). However, there are artifacts from choosing that approach (like if you have silence on three tracks and one loud track, the final output will be much quiter than the volume of the one track that is not silent). There are more complicated algorithms you can use to do the mixing, but by then you are writing your own mixer :P.