GCP Speech to text - Java API not working

GCP Speech to text - Java API not working - java

I have a sample .webm file recorded using MediaRecorder in Chrome Browser. When I use Google speech java client to get transcription for the video, it returns empty transcription. Here is what my code looks like
SpeechSettings settings = null;
Path path = Paths.get("D:\\scrap\\gcp_test.webm");
byte[] content = null;
try {
content = Files.readAllBytes(path);
settings = SpeechSettings.newBuilder().setCredentialsProvider(credentialsProvider).build();
} catch (IOException e1) {
throw new IllegalStateException(e1);
}
try (SpeechClient speech = SpeechClient.create(settings)) {
// Builds the request for remote FLAC file
RecognitionConfig config = RecognitionConfig.newBuilder()
.setEncoding(AudioEncoding.LINEAR16)
.setLanguageCode("en-US")
.setUseEnhanced(true)
.setModel("video")
.setEnableAutomaticPunctuation(true)
.setSampleRateHertz(48000)
.build();
RecognitionAudio audio = RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build();
// RecognitionAudio audio = RecognitionAudio.newBuilder().setUri("gs://xxxx/gcp_test.webm") .build();
// Use blocking call for getting audio transcript
RecognizeResponse response = speech.recognize(config, audio);
List<SpeechRecognitionResult> results = response.getResultsList();
for (SpeechRecognitionResult result : results) {
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcription: %s%n", alternative.getTranscript());
}
} catch (Exception e) {
e.printStackTrace();
System.err.println(e.getMessage());
}
If, I use the same file and visit https://cloud.google.com/speech-to-text/ and upload file in the demo section. It seems to work fine and shows transcription. I am clueless about whats going wrong here. I verified the request sent by demo and here it what looks like
I am sending the exact set of parameters, but that didn't work. Tried uploading file to Cloud storage but that too gave same result (no transcription).

After going through error and trials (and looking at the javascript samples), I could solve the issue. The serialized version of audio should be in FLAC format. I was sending the video file(webm) as is to Google Cloud. The demo on the site extracts audio stream using Javascript Audio API and then sents the data in base64 format to make it work.
Here are the steps that I executed to get the output.
Used FFMPEG to extract audio stream into FLAC format from webm.
ffmpeg -i sample.webm -vn -acodec flac sample.flac
The extracted file should be made available using either Storage cloud or send as ByteString.
Set the appropriate model while calling the speech API (for english language video model works, while for french language command_and_search). I don't have any logical reason for this. I realised it after trial and error with demo on Google cloud site.

I got results with flac encoded file.
Sample code results words with timestamp,
public class SpeechToTextSample {
public static void main(String... args) throws Exception {
try (SpeechClient speechClient = SpeechClient.create()) {
String gcsUriFlac = "gs://yourfile.flac";
RecognitionConfig config =
RecognitionConfig.newBuilder()
.setEncoding(AudioEncoding.FLAC)
.setEnableWordTimeOffsets(true)
.setLanguageCode("en-US")
.build();
RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUriFlac).build(); //for large files
OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata> response = speechClient.longRunningRecognizeAsync(config, audio);
while (!response.isDone()) {
System.out.println("Waiting for response...");
Thread.sleep(1000);
}
// Performs speech recognition on the audio file
List<SpeechRecognitionResult> results = response.get().getResultsList();
for (SpeechRecognitionResult result : results) {
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcription: %s%n", alternative.getTranscript());
for (WordInfo wordInfo : alternative.getWordsList()) {
System.out.println(wordInfo.getWord());
System.out.printf(
"\t%s.%s sec - %s.%s sec\n",
wordInfo.getStartTime().getSeconds(),
wordInfo.getStartTime().getNanos() / 100000000,
wordInfo.getEndTime().getSeconds(),
wordInfo.getEndTime().getNanos() / 100000000);
}
}
}
}
}
GCP supports different languages, I have used "en-US" for my example.
Please refer following link document to know language list.

Related

Google Drive resumable upload in v3

I am looking for some help/example to perform a resumeable upload to Google Drive using the new v3 REST API in Java.
I know there is a low level description here: Upload files | Google Drive API. But at the moment I am not willing to understand any of these low level requests, if there isn't another, simpler method ( like former MediaHttpUploader, which is deprecated now...)
What I currently do is:
File fileMetadata = new File();
fileMetadata.setName(name);
fileMetadata.setDescription(...);
fileMetadata.setParents(parents);
fileMetadata.setProperties(...);
FileContent mediaContent = new FileContent(..., file);
drive.files().create(fileMetadata, mediaContent).execute();
But for large files, this isn't good if the connection interrupts.

I've just created an implementation on that recently. It will create a new file on your DriveFolder and return its metadata when the task succeeds. While uploading, it will also update the listener with uploading info. I added comments to make it auto explanable:
public Task<File> createFile(java.io.File yourfile, MediaHttpUploaderProgressListener uploadListener) {
return Tasks.call(mExecutor, () -> {
//Generates an input stream with your file content to be uploaded
FileContent mediaContent = new FileContent("yourFileMimeType", yourfile);
//Creates an empty Drive file
File metadata = new File()
.setParents(parents)
.setMimeType(yourFileMimeType)
.setName(yourFileName);
//Builds up the upload request
Drive.Files.Create uploadFile = mDriveService.files().create(metadata, mediaContent);
//This will handle the resumable upload
MediaHttpUploader uploader = uploadBackup.getMediaHttpUploader();
//choose your chunk size and it will automatically divide parts
uploader.setChunkSize(MediaHttpUploader.MINIMUM_CHUNK_SIZE);
//according to Google, this enables gzip in future (optional)
uploader.setDisableGZipContent(false); versions
//important, this enables resumable upload
uploader.setDirectUploadEnabled(false);
//listener to be updated
uploader.setProgressListener(uploadListener);
return uploadFile.execute();
});
}
And make your Activity extends MediaHttpUploaderProgressListener so you have real time updates on the file progress:
#Override
public void progressChanged(MediaHttpUploader uploader) {
String sizeTemp = "Uploading"
+ ": "
+ Formatter.formatShortFileSize(this, uploader.getNumBytesUploaded())
+ "/"
+ Formatter.formatShortFileSize(this, totalFileSize);
runOnUiThread(() -> textView.setText(sizeTemp));
}
For calculating the progress percentage, you simply do:
double percentage = uploader.getNumBytesUploaded() / totalFileSize
Or use this one:
uploader.getProgress()
It gives you the percentage of bytes that have been uploaded, represented between 0.0 (0%) and 1.0 (100%). But be sure to have your content length specified, otherwise it will throw IllegalArgumentException.

Sending in-memory generated .docx files from server to client with Spark

I am creating a web application using the Spark Java framework. The front-end is developed using AngularJS.
I want to generate a .docx file on the server (in-memory) and send this to the client for download.
To achieve this I created an angular service with the following function being called after the user clicks on a download button:
functions.generateWord = function () {
$http.post('/api/v1/surveys/genword', data.currentSurvey).success(function (response) {
var element = angular.element('<a/>');
element.attr({
href: 'data:attachment;charset=utf-8;application/vnd.openxmlformats-officedocument.wordprocessingml.document' + response,
target: '_blank',
download: 'test.docx'
})[0].click();
});
};
On the server, this api call gets forwarded to the following method:
public Response exportToWord(Response response) {
try {
File file = new File("src/main/resources/template.docx");
FileInputStream inputStream = new FileInputStream(file);
byte byteStream[] = new byte[(int)file.length()];
inputStream.read(byteStream);
response.raw().setContentType("data:attachment;chatset=utf-8;application/vnd.openxmlformats-officedocument.wordprocessingml.document");
response.raw().setContentLength((int) file.length());
response.raw().getOutputStream().write(byteStream);
response.raw().getOutputStream().flush();
response.raw().getOutputStream().close();
return response;
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
I have tried to solve this in MANY different ways and I always end up with a corrupted 'test.docx' that looks like this:

Solved it by using blobs and specifying the response type as 'arraybuffer' in the $http.post api call. The only bad thing with this solution (as far as I know) is that it doesn't play well with IE, but that's a problem for another day.
functions.generateWord = function () {
$http.post('/api/v1/surveys/genword', data.currentSurvey, {responseType: 'arraybuffer'})
.success(function (response) {
var blob = new Blob([response], {type: 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'});
var url = (window.URL || window.webkitURL).createObjectURL(blob);
var element = angular.element('<a/>');
element.attr({
href: url,
target: '_blank',
download: 'survey.docx'
})[0].click();
});
};
I think what went wrong was that the byte stream got encoded as plain text when I tried to create a URL with:
href: 'data:attachment;charset=utf-8;application/vnd.openxmlformats-officedocument.wordprocessingml.document' + response
thus corrupting it.
When using blobs instead, I get a "direct" link to the generated byte stream and no encoding is done on it since the response type is set to 'arraybuffer'.
Note that this is just my own reasoning of why things went wrong with the original code. I might be terribly wrong, so feel free to correct me if that's the case.

speech recognition with cmu sphinx - doesn't work properly

I'm trying to use CMU Sphinx for speech recognition in java but the result I'm getting is not correct and I don't know why.
I have a .wav file I recorded with my voice saying some sentence in English.
Here is my code in java:
Configuration configuration = new Configuration();
// Set path to acoustic model.
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// Set path to dictionary.
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
// Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.dmp");
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
recognizer.startRecognition(new FileInputStream("assets/voice/some_wav_file.wav"));
SpeechResult result = null;
while ((result = recognizer.getResult()) != null) {
System.out.println("~~ RESULTS: " + result.getHypothesis());
}
recognizer.stopRecognition();
}
catch(Exception e){
System.out.println("ERROR: " + e.getMessage());
}
I also have another code in Android that doesn't work as well:
Assets assets = new Assets(context);
File assetDir = assets.syncAssets();
String prefix = assetDir.getPath();
Config c = Decoder.defaultConfig();
c.setString("-hmm", prefix + "/en-us-ptm");
c.setString("-lm", prefix + "/en-us.lm");
c.setString("-dict", prefix + "/cmudict-en-us.dict");
Decoder d = new Decoder(c);
InputStream stream = context.getResources().openRawResource(R.raw.some_wav_file);
d.startUtt();
byte[] b = new byte[4096];
try {
int nbytes;
while ((nbytes = stream.read(b)) >= 0) {
ByteBuffer bb = ByteBuffer.wrap(b, 0, nbytes);
short[] s = new short[nbytes/2];
bb.asShortBuffer().get(s);
d.processRaw(s, nbytes/2, false, false);
}
} catch (IOException e) {
Log.d("ERROR: ", "Error when reading file" + e.getMessage());
}
d.endUtt();
Log.d("TOTAL RESULT: ", d.hyp().getHypstr());
for (Segment seg : d.seg()) {
Log.d("RESULT: ", seg.getWord());
}
I used this website to convert the wav file to 16bit, 16khz, mono and little-endian (tried all the options of it).
Any ideas why is doesn't work. I use the built in dictionaries and accustic models and my accent in English is not perfect (don't know if it matters).
EDIT:
This is my file. I recorded myself saying: "My baby is cute" and that's what I expect to be the output.
In the pure java code I get: "i've amy's youth" and in the android code I getl: " it"
Here is file containing the logs.

Your audio is somewhat corrupted by conversion. You should record into wav originally or into some other lossless format. Your pronunciation is also far from US English. For conversion between formats you can use sox instead of external website. Your android sample seems correct but it feels like you decode different file with android. You might check that you have actual proper file in resources.

how to get bit rate , sampling rate and no. of channels of a audio file in android

I need your help for the below query:
Query:
Is there any way of getting following info of an audio file.
Sample rate, Channel, Bitrate of an audio file.
For extracting bitrate, "MediaMetadataRetriever" API is available (METADATA_KEY_BITRATE).
Please suggest if it can be done using any android API.
Found this below API, But its use is actually in different.
http://developer.android.com/reference/android/medi/AudioTrack.html
I want to extract these using Android API programmactically :
Sampling rate, Quantization, Channel of an input audio file.
Please help on this.
Thanks in advance.

This can be done using MeiaExtractor like this:
MediaExtractor mex = new MediaExtractor();
try {
mex.setDataSource(path);// the adresss location of the sound on sdcard.
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
MediaFormat mf = mex.getTrackFormat(0);
int bitRate = mf.getInteger(MediaFormat.KEY_BIT_RATE);
int sampleRate = mf.getInteger(MediaFormat.KEY_SAMPLE_RATE);
int channelCount = mf.getInteger(MediaFormat.KEY_CHANNEL_COUNT);

Use MediaPlayer.getTrackInfo() during playback (after METADATE_UPDATE event come to onInfo callback) to obtain MediaFormat object by invoke getFormat for audio stream track. And then from MediaFormat you can get:
BIT_RATE
CHANNEL_COUNT
SAMPLE_RATE

sending video file to browser over websocket

I want to send a video file from a server written in java to a web browser client.
The socket connection works fine and I have no trouble sending text.
The library I'm using to make a socket server is this https://github.com/TooTallNate/Java-WebSocket
This is the code for sending the file
public void sendFile(WebSocket conn,String path)
{
try
{
File file = new File(path);
byte[] data = new byte[(int)file.length()];
DataInputStream stream = new DataInputStream(new FileInputStream(file));
stream.readFully(data);
stream.close();
conn.send(data);
..snip catch statements..
Here is my javascript code for catching the file
function connect()
{
conn = new WebSocket('ws://localhost:8887');
conn.onopen = function(){alert("Connection Open");};
conn.onmessage = function(evt){if(evt.data instanceof Blob){readFile(evt);}else{alert(evt.data);}};
conn.onclose = function(){alert('connection closed');};
}
function readFile(file_data)
{
var video = document.getElementById('area');
video.src = window.URL.createObjectURL(file_data.data);
}
..skip to html element for playing the file..
<video id='area' controls="controls"></video>
I want to be able to receive the file in the browser and play it.
The error I get while trying to send a webm video file to fireox is:
HTTP "Content-Type" of "application/octet-stream" is not supported. Load of media resource blob:794345a5-4b6d-4585-b92b-3acb51612a6c failed.
Is it possible to receive a video file from a websocket and play it?
Am I implementing something wrong?

Video element requires right content-type, ws Blob comes with generic one, and it seems (to me) there is no way to set it serverside or clientside.
Fortunately, Blob has slice(start, end, contentType) method:
var rightBlob = originalBlob.slice(0, originalBlob.size, 'video/webm')

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

GCP Speech to text - Java API not working - java

Related

Google Drive resumable upload in v3

Sending in-memory generated .docx files from server to client with Spark

speech recognition with cmu sphinx - doesn't work properly

how to get bit rate , sampling rate and no. of channels of a audio file in android

sending video file to browser over websocket

Categories

Resources