I have few MP3 files which are speeches. I have used Android Speech to Text before so I know it can store spoken words. Is there any way where we can get the spoken words from the MP3 and display it in a EditText ?
I am thinking about playing the MP3 silently and identify the words, but have no idea about how to do that. I am using Google Speech Engine.
There is no native way to convert an audio file that contains spoken words to text on Android. You'll need to use a third-party API to do this, such as.
A&T
Nuance
iSpeech
And perhaps Pocket Sphinx, although you may have to write the file input stream side of it yourself.
If you're not concerned about breaking terms and conditions, you could use the Chrome Speech API.
Related
I want to know if it is possible to access the audio that is currently playing on the Android device.
Example: if Spotify is running in the background, I want to access the audio to control some LEDs that are connected to my RaspberryPi.
I want to create some sort of equalizer that changes colors depending on the sound that is currently playing. I appreciate if some one could tell me if accessing the main audio output is possible or not.
Unless you are using a rooted phone, it's not possible to capture output of a random app on Android.
You can however create an app that plays media files and captures the output for the purpose of visualization with "Visualizer" effect. You can take a look on the sample here: https://android.googlesource.com/platform/development/+/master/samples/ApiDemos/src/com/example/android/apis/media/AudioFxDemo.java
(look for "Visualizer").
If you are using Raspberry Pi anyway, you can just play all your music through it, capture and analyze it there. You will need an external USB sound card though. See for example this post: http://www.g7smy.co.uk/2013/08/recording-sound-on-the-raspberry-pi/
There they just record and play audio back, but you can insert an analysis phase in between.
I read here there are 2 types of voice commands in glass:
1) choosing from a menu (e.g. "ok glass, directions to")
2) free speech recognition (e.g. "fifth avenue NYC")
I want develop a glass app and want to use voice recognition.
which of them can i use non-English language?
I talk developer-wise to change the language not user-wise.
Meaning saying "Ok Glass" and then menu items are in hebrew
or "take me to" and then place description in hebrew.
Is there any workaround for that?
At this point Glass voice recognition appears to only support US English. The "Ok Glass" menu items are controlled by Google for official apps. It is my understanding that the classifiers that recognize these commands are included in the Glass code, not just recognized using the string. (Side loaded apps can have their own voice command based on an English string but it isn't as reliable as the ones Google has officially endorsed.
Free speech recognition for example when you reply to an email in Glass is done using the RecognizerIntent.ACTION_RECOGNIZE_SPEECH intent. While Android documentation suggests one could add the extra parameter to the intent of EXTRA_LANGUAGE, Glass itself only handle's English.
Therefore if you want to workaround this you will need to use the MediaRecorder and grab the audio directly, stream it to a service providing Hebrew Voice to Text transcription and then send the text back to your Glass application. This would not be supported directly from the clock, you'd have to either handle it from a LiveCard or an Immersion. Glass will display Hebrew characters.
I am creating an application which has voice search facility? like in my music player if i want to play some music than i just speak song title than it will play that song. I already try with goggle voice search it works but i want to search off-line.How to do this? So any one please help me. Thanks in advance.
If you are referring to my application utter! I'm working on a developer API that will appear here in a couple of weeks time and allow applications to interact with it, providing the user has it installed of course.
Otherwise, for information on using offline recognition, see my answer on this post
You have to utilize a speech recognition system with a vocabulary embedded.
If you are using a vocabulary you do not need the web. You can process the information offline.
I think you need something like this:
Speech recognition by Microsoft
I have not looked into the Java Sound API too much before I had this idea,
so I went and looked at the Tutorial by Oracle concerning this topic.
I did not find what I was looking for, though...
Basically, I need to take the audio feed from the microphone input
and mix in another audio file. This would be for Skype, to play background music.
Now the important thing about this is that the modified audio feed
would have to be used BY Skype instead of the original microphone-only audio.
Is there any (easy?) way to achieve this?
Regards,
Tom S.
I have an audio file in .3gp format on my Android device which I wish
to upload to YouTube. I know that YouTube is a video upload site and
that I need to convert this sound file to video.
I just want an image to display all the time the audio is playing.
Google tells me there are number of tools that can help me. But I want
to do this via java code from my Android device.
Please help.
Thanks.
There are tools such as FFMPEG available for free that allow you to, essentially, mix and convert heterogenous streams. That is you can add a bitmap to a video, create video from slide shows and then add sound etc. (See a related question I asked here).
These programs can be executed from within java applications by making Runtime.exec(..) calls.
Sun has an example for stitching multiple JPEGs together into a movie, you can find it here. You should be able to take this example, (its fairly robust), and add what you need to it.
I recommend looking into the Java Media Framework (FAQ)
You can find a vast collection of sample applets/code at the Sun Solutions page. You can find the API on this page. I do hope this is compatible on the Android platform, as I haven't had any personal experience developing for it. But it might be a good place to start.