I am having problem while converting audio file to text using google speech to text. I am able to download the file from Twilio but when I supply that audio file to google speech then it gives me 0 length response. But if I convert this downloaded file using vlc media player and then supply it to google speech then it gives me right output. Please help me on this I am stuck for about a week now.
After getting response from Twilio I save it in a file with .wav extension
InputStream in = new URL(jsonObject.get("redirect_to").toString()).openStream();
Files.copy(in, Paths.get("src/main/resources/mp.wav"), StandardCopyOption.REPLACE_EXISTING);
Below is the google speech to text code.
Path path = Paths.get("src/main/resources/mp.wav");
byte[] content = Files.readAllBytes(path);
ByteString audioBytes = ByteString.copyFrom(content);
try (SpeechClient speech = SpeechClient.create()) {
RecognitionConfig recConfig =
RecognitionConfig.newBuilder()
.setEncoding(RecognitionConfig.AudioEncoding.LINEAR16)
.setLanguageCode("en-US")
.setSampleRateHertz(44100)
.setModel("default")
.setAudioChannelCount(2)
.build();
RecognitionAudio recognitionAudio = RecognitionAudio.newBuilder().setContent(audioBytes).build();
OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata> response =
speech.longRunningRecognizeAsync(recConfig, recognitionAudio);
while (!response.isDone()) {
System.out.println("Waiting for response...");
Thread.sleep(10000);
}
List<SpeechRecognitionResult> results = response.get().getResultsList();
for (SpeechRecognitionResult result : results) {
// There can be several alternative transcripts for a given chunk of speech. Just use the
// first (most likely) one here.
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcription: %s%n", alternative.getTranscript());
}
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
As #philnash has suggested, by appending a .mp3 extension to the recording URL, the MP3 version of the recording can be downloaded from Twilio. The same applies to the '.wav' extension as well.
InputStream in = new URL(jsonObject.get(“redirect_to”).toString()+”.mp3”).openStream(); // or “.wav”
Files.copy(in, Paths.get(“src/main/resources/mp.wav”), StandardCopyOption.REPLACE_EXISTING);
I tested this out with a sample Twilio recording and the ffprobe results are below.
Downloaded .wav file
Input #0, **wav**, from 'from-twilio-change-extension.wav':
Duration: 00:00:14.60, bitrate: 128 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 8000 Hz, 1 channels, s16, 128 kb/s
Downloaded .mp3 file
Input #0, **mp3**, from 'from-twilio-change-extension.mp3':
Duration: 00:00:14.68, start: 0.000000, bitrate: 32 kb/s
Stream #0:0: Audio: mp3, 22050 Hz, mono, fltp, 32 kb/s
As for audio encodings supported by the Speech-to-Text API, both WAV and MP3 are supported but MP3 is a Beta feature available only in the version v1p1beta1. So, the client library imports will look like com.google.cloud.speech.v1p1beta1.Packages.... The audio encoding in RecognitionConfig has to be changed according to the encoding of the audio file used. For a .wav file, RecognitionConfig.AudioEncoding.LINEAR16 has to be used, and for a .mp3 file, RecognitionConfig.AudioEncoding.MP3 has to be used.
An alternative would be to use the FFMPEG tool to convert audio files into one of the recognized codecs by Speech-to-Text. More information about usage of the tool can be found here. In your scenario, the .mka to .wav/.mp3 conversion can be done from the Java code as shown below.
String[] ffmpegCommand = {"ffmpeg", "-i", "/full/path/to/inputFile.mka", "/full/path/to/outputFile.wav"};
ProcessBuilder pb = new ProcessBuilder(ffmpegCommand);
pb.inheritIO();
pb.start();
Related
I'm writing a generalized utility for converting audio files to WAV. Works ok for WAV to WAV (I'm also changing some of the attributes), but I can't convert MP3 files. I have mp3spi in my classpath, so it seems to be able to read the MP3, but the WAV file that gets written doesn't seem to work.
In this example, I'm not trying to change any properties. Just reading the MP3 and writing to a WAV file
My code looks something like this
File inputFileObj = new File(input);
AudioInputStream audioInputStream = null;
try {
audioInputStream = AudioSystem.getAudioInputStream(inputFileObj);
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Input file format:");
System.out.println(AudioSystem.getAudioFileFormat(inputFileObj));
try {
AudioSystem.write(audioInputStream, outputType, new File(output));
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Output file format:");
System.out.println(AudioSystem.getAudioFileFormat(new File(output)));
Here's the output. As you can see, it appears to write the output file, but when I try to retrieve the format of the output file, it can't handle it. And if I try to play the output file, the player doesn't recognize it.
Input file: c:\testfiles\sample-b-converted.mp3
Output file: c:\testfiles\foo.wav
Output type: wav
Input file format:
MP3 (.mp3) file, byte length: 13227300, data format: MPEG2L3 16000.0 Hz, unknown bits per sample, mono, unknown frame size, 27.777779 frames/second, , frame length: 122475
Bytes written: 13227344
Output file format:
Exception in thread "main" javax.sound.sampled.UnsupportedAudioFileException: file is not a supported file type
at javax.sound.sampled.AudioSystem.getAudioFileFormat(AudioSystem.java:1078)
at org.torchai.AudioFileConvert01.main(AudioFileConvert01.java:60)
Is there something else I need to get this working?
Someone posted a comment referring me to mp3 to wav conversion in java. I had seen this issue, but didn't quite see the main aspect of the answer, since it wasn't really explained well.
An MP3 file apparently needs to go through a 2-step conversion. I don't fully understand why, but it seems you must first convert it to PCM_SIGNED with a sample size of 16 and a framesize of 2*# of channels. At that point, you can convert it again to the final format that you want.
Would still love to have a better explanation, but this at least gets me past my issue.
I am currently using React as my front-end and Java Spring Boot as my server. I am using React-Mic to record audio, passed the audio to FormData and send a HTTP post request with that FormData as body to my Java server. However, as the recorded audio is in webm, there is no appropriate encoding for Google Speech-To-Text API. Any idea how I can convert the audio to flac or any other format type supported by Google Speech-To-Text API?
Could probably use JAVE2 to convert from webm to mp3 (or other).
https://github.com/a-schild/jave2
The sample in the readme should point you in the right direction:
try {
File source = new File("file path"); // Path to your webm
File target = new File("file path"); // Output path
//Audio Attributes
AudioAttributes audio = new AudioAttributes();
audio.setCodec("libmp3lame"); // Change this to flac if you prefer flac
audio.setBitRate(128000);
audio.setChannels(2);
audio.setSamplingRate(44100);
//Encoding attributes
EncodingAttributes attrs = new EncodingAttributes();
attrs.setFormat("mp3"); // Change to flac if you prefer flac
attrs.setAudioAttributes(audio);
//Encode
Encoder encoder = new Encoder();
encoder.encode(new MultimediaObject(source), target, attrs);
// The target file should now be present at the path specified above
} catch (Exception ex) {
ex.printStackTrace();
}
After conversion you'd then have a file object which you could convert to a byte[] to send to the speech to text api like in this sample:
https://github.com/GoogleCloudPlatform/java-docs-samples/blob/master/speech/cloud-client/src/main/java/com/example/speech/QuickstartSample.java
Basically, I built an app in android that records my message and saves it as .m4a or .3gpp format.
When I plays the records in my app it works fine, but when I'm trying to play it on my website it doesnt work...
Android(Java)
recorder = new MediaRecorder();
recorder.setAudioSource(MediaRecorder.AudioSource.MIC);
recorder.setOutputFormat(MediaRecorder.OutputFormat.MPEG_4);
recorder.setAudioEncoder(MediaRecorder.AudioEncoder.AMR_NB);
recorder.setOutputFile(OUTPUT_FILE);
recorder.prepare();
recorder.start();
Website(HTML)
<audio controls="controls" preload="none">
<source src="my_record.m4a" type="audio/mp4"/>
</audio>
P.S: When I tried to open some other m4a audio files(files that i found online), I succeded.
The audio tag is quite sensitive about this. Anything above 128mbps it will not play. A lot of encoders automatically choose the highest quality bit rate (usually around 320mbps) and the audio tag won't play them. Sample rate should be 44100hz.
the sampling rate supported by AAC audio coding standard ranges from 8 to 96 kHz, the sampling rate supported by AMRNB is 8kHz, and the sampling rate supported by AMRWB is 16kHz.
Hence change Audioencoder to AAC in your code
recorder.setAudioEncoder(MediaRecorder.AudioEncoder.AMR_NB);
to
recorder.setAudioEncoder(MediaRecorder.AudioEncoder.AAC);
and then set filename extension to .mp3
Hope this works for you.:)
To stream audio file I have implemented following code. But i am getting Exception:
javax.sound.sampled.UnsupportedAudioFileException: could not get audio input stream from input file
at javax.sound.sampled.AudioSystem.getAudioInputStream(AudioSystem.java:1170)
Can Any one help me please......
try {
// From file
System.out.println("hhhhhhhhhhhhhhhh");
AudioInputStream stream = AudioSystem.getAudioInputStream(new File("C:\\track1.mp3"));
System.out.println("stream created");
AudioFormat format = stream.getFormat();
if (format.getEncoding() != AudioFormat.Encoding.PCM_SIGNED) {
System.out.println("in if");
format = new AudioFormat(
AudioFormat.Encoding.PCM_SIGNED,
format.getSampleRate(),
format.getSampleSizeInBits()*2,
format.getChannels(),
format.getFrameSize()*2,
format.getFrameRate(),
true); // big endian
stream = AudioSystem.getAudioInputStream(format, stream);
}
// Create line
SourceDataLine.Info info = new DataLine.Info(
SourceDataLine.class, stream.getFormat(),
((int)stream.getFrameLength()*format.getFrameSize()));
SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info);
line.open(stream.getFormat());
line.start();
// Continuously read and play chunks of audio
int numRead = 0;
byte[] buf = new byte[line.getBufferSize()];
while ((numRead = stream.read(buf, 0, buf.length)) >= 0) {
int offset = 0;
while (offset < numRead) {
offset += line.write(buf, offset, numRead-offset);
}
}
line.drain();
line.stop();
}
That you're doing this job in a servlet class gives me the impression that your intent is to play the mp3 file whenever someone visits your website and that the visitor should hear this mp3 file.
If true, I'm sorry to say, but you're approaching this entirely wrong. Java servlet code runs in webserver machine and not in webbrowser machine. Whenever someone visits your website, this way the mp3 file would only be played at the webserver machine. This is usually a physically completely different machine which runs at the other side of the network connection and the visitor ain't ever going to hear the music.
You want to send the mp3 file raw (unmodified byte by byte) from webserver to the webbrowser without massaging it by some Java Audio API and instruct the webbrowser to play this file. The easist way is to just drop the mp3 file in public webcontent (there where your HTML/JSP files also are) and use HTML <embed> tag to embed it in your HTML/JSP file. The below example assumes the MP3 file to be in the same folder as the HTML/JSP file:
<embed src="file.mp3" autostart="true"></embed>
That's all and this is supported in practically every browser and it will show a player as well.
If the MP3 file is by business requirement stored outside public webcontent, then you may indeed need a servlet for this, but the servlet should do absolutely nothing more than getting an InputStream of it in some way and write it unmodified to the OutputStream of the HttpServletResponse the usual Java IO way. You only need to set the HTTP Content-Type header to audio/mpeg beforehand and if possible also the HTTP Content-Length header. Then point the src to the servlet's URL instead.
<embed src="mp3servlet" autostart="true"></embed>
Default java AudioInputStream does not support mp3 files. You have to plug in MP3SPI to let it decode mp3.
ALso, what do you mean by streaming? This code will play the audio file, not stream it as in internet radio streaming.
I'm working on an application that has to process audio files. When using mp3 files I'm not sure how to handle data (the data I'm interested in are the the audio bytes, the ones that represent what we hear).
If I'm using a wav file I know I have a 44 bytes header and then the data. When it comes to an mp3, I've read that they are composed by frames, each frame containing a header and audio data. Is it possible to get all the audio data from a mp3 file?
I'm using java (I've added MP3SPI, Jlayer, and Tritonus) and I'm able to get the bytes from the file, but I'm not sure about what these bytes represent or how to handle then.
From the documentation for MP3SPI:
File file = new File(filename);
AudioInputStream in= AudioSystem.getAudioInputStream(file);
AudioInputStream din = null;
AudioFormat baseFormat = in.getFormat();
AudioFormat decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
baseFormat.getSampleRate(),
16,
baseFormat.getChannels(),
baseFormat.getChannels() * 2,
baseFormat.getSampleRate(),
false);
din = AudioSystem.getAudioInputStream(decodedFormat, in);
You then just read data from din - it will be the "raw" data as per decodedFormat. (See the docs for AudioFormat for more information.)
(Note that this sample code doesn't close the stream or anything like that - use appropriate try/finally blocks as normal.)
The data that you want are the actual samples, while MP3 represents the data differently. So, like what everyone else has said - you need a library to decode the MP3 data into actual samples for your purpose.
As mentioned in the other answers, you need a decoder to decode MP3 into regular audio samples.
One popular option would be JavaLayer (LGPL).