Which data structure for linking text with audio in Java - java

I want to write a program in which plays an audio file that reads a text.
I want to highlite the current syllable that the audiofile plays in green and the rest of the current word in red.
What kind of datastructure should I use to store the audio file and the information that tells the program when to switch to the next word/syllable?

This is a slightly left-field suggestion, but have you looked at Karaoke software? It may not be seen as "serious" enough, but it sounds very similar to what you're doing. For example, Aegisub is a subtitling program that lets you create subtitles in the SSA/ASS format. It has karaoke tools for hilighting the chosen word or part.
It's most commonly used for subtitling anime, but it also works for audio provided you have a suitable player. These are sadly quite rare on the Mac.
The format looks similar to the one proposed by Yuval A:
{\K132}Unmei {\K34}no {\K54}tobira
{\K60}{\K132}yukkuri {\K36}to {\K142}hirakareta
The lengths are durations rather than absolute offsets. This makes it easier to shift the start of the line without recalculating all the offsets. The double entry indicates a pause.
Is there a good reason this needs to be part of your Java program, or is an off the shelf solution possible?

How about a simple data structure that describes what next batch of letters consists of the next syllable and the time stamp for switching to that syllable?
Just a quick example:
[0:00] This [0:02] is [0:05] an [0:07] ex- [0:08] am- [0:10] ple

To highlight part of word sounds like you're getting into phonetics which are sounds that make up words. It's going to be really difficult to turn a sound file into something that will "read" a text. Your best bet is to use the text itself to drive a phonetics based engine, like FreeTTS which is based off of the Java Speech API.
To do this you're going to have to take the text to be read, split it into each phonetic syllable and play it. so "syllable" is "syl" "la" "ble". Playing would be; highlight syl, say it and move to next one.
This is really "old-skool" its been done on the original Apple II the same way.

you might want to get familiar with FreeTTS -- this open source tool : http://freetts.sourceforge.net/docs/index.php -
You might want to feed only a few words to the TTS engine at a given point of time -- highlight them and once those are SPOKEN out, de-highlight them and move to the next batch of words.
BR,
~A

Related

How to compare words/sounds in Android Studio?

im looking for the best (and fastest) way to record a short audio input (like one word) from mobile microphone and then compare it with a long real time audio input (like speech) - from the same person and look for word occurrence.
I tried many approaches like using typical SpeechRecognizer, but there were many problems, like there is actually no way to guarantee that it will give reasults fast enough or run for many minutes.
VoiceRecognition Android Taking Too Long To React
Long audio speech recognition on Android
I dont really need to recognize which words is the person saying, only to be able to find occurences with some deviation.
It would be nice if you could give me some suggestions of how to do so.
EDIT: Im basically looking for a way to control the app with sound inputed from a user
Here are a couple ideas to consider.
(1) First, create a set of common short sounds that will likely be part of the search. For example, perhaps all phonemes, or a something like a set of all consonant-vowel combinations, e.g., bah, bay, beh, bee, etc. and the same with cah, cay, keh, key, etc.
Then, run through the "long" target sample with each, indexing the locations where these phonemes are found.
Now, when the user gives you a word, first compare it to your set of indexed phoneme fragments, and then use the matches to focus your search and test in the long target file.
(2) Break your "long" file up into fragments and sort the fragments. Then compare the input word to the items in the sorted list, using something like a binary search algorithm.

Multi channel audio within processing

I’m trying to build a sketch that shows me levels of audio coming into a system. I want to be able to do more than 2 channels so i know that i need more than the processing.sound library can provide currently and my searching has led me to javax.sound.sampled.*, however this is as far as my searching and playing has got me.
Does anyone know how to query the system for how many lines are coming in and to get the amplitude of audio on each line?
This is kind of a composite question.
For the number of lines, see Accessing Audio System Resources in the Java tutorials. There is sample code there for inspecting what lines are present. If some of the terms are confusing, most are defined in the tutorial immediately preceding this one.
To see what is on the line, check Capturing Audio.
To get levels, you will probably want to do some sort of rolling average (usually termed as root-mean-square). The "controls" (sometimes) provided at a higher level are kind of iffy for a variety of reasons.
In order to calculate those levels, though, you will have to convert the byte data to PCM. The example code in Using Files and Format Converters has example code that shows the point where the conversion would take place. In the first real example given, under the heading "Reading Sound Files" take note of the place where the comment sits that reads
// Here, do something useful with the audio data that's
// now in the audioBytes array...
I recall there are already StackOverflow questions that show the commands needed to convert bytes to PCM.

How to Read Text From Bounding Box using Java With OpenCV

I am working on Handwritten Form Recognition System, till now i have reached to this step where,i have been able to detect text using java with openCV but now i want to read the text from each of these bounding boxes Click to open image
I have being doing research to find out the process for the same using java with openCV but i was unable to find any.
Suggest me some links,Technologies,methods or process to perform this particular task with "JAVA".
This answer is more general than question specific. I will try to stick as much as possible with the problem statement.
Although there is a lot of on going research on recognition of hand written text, there is no full-proof method, which works with all possible problems.
The sample image you posted here is relatively noisy, with extremely high variance between the font of the same letter. This is exactly where it gets tricky.
I would personally suggest that once you have the bounding boxes around the text (which you already do), run contour extraction in all these bounding boxes in order to extract single letters. Once you have them, you need to figure out relevant feature/s that can represent the maximum variance (or at least 95% Confidence Interval) of the particular letter.
With this/ese feature/s, you need to train a supervised algorithm, letters as training data and their corresponding value (for eg. actual values) as labels. Once you have that, give it some data (the easiest and most difficult cases) to analyze the accuracy.
These links can help you for a start :
One of my first tools to check the accuracy with the set of features I use before I start coding: Weka
Go through basic tutorials on machine learning and how they work - Personal Favorite
You could try TensorFlow.
Simple Digit Recognition OCR in OpenCV-Python - Great for beginners.
Hope it helps!

Handwriting Recognition Java

I know there are other posts about this, but I cannot seem to find one strictly for handwriting. I am going to have a form and all I need to read in is 8 squares in the left hand corner that will have 3 letters proceeded by 5 numbers.
The problem with most posts is that people either post about software for writing on the screen or software that doesn't recognize handwriting yet. I would prefer to have something in java, but something simple in another language would work.
What would really work is if people could scan their documents and just type the job number for the document name, but apparently they cant do that right...
Can you change the form? This problem will simplify a lot if you can change the form to be something that is easier for a machine to read. To recognize an arbitrary handwriting is hard as well as error prone.
What I have in mind is a form like this:
form example http://shareworldonline.com/w3/testprep/images/test%20form.jpg
However, if you have to have handwriting, check out the solutions in this thread.
if i got you correctly, you are doing offline hwr,
when i was doing offline hwr, i found most difficult separating characters in word, seems like you have them in squares, so all what you need to do is find your boxes (ie by using histogram)
and compare content of your box with each element in yours characters database (i used levenshtein distance for that)
I know it maybe not very helpful, but maybe push you on right track.

Identify audio sample in a file

I want to be able to identify an audio sample (that is provided by the user) in a audio file I've got (mp3).
The mp3 file is a radio stream that I've kept for testing purposes, and I have the Pre-roll of the show. I want to identify it in the file and get the timestamp where it's playing in the file.
Note: The solution can be in any of the following programming languages: Java, Python or C++. I don't know how to analyze the video file and any reference about this subject will help.
This problem falls under the category of audio fingerprinting. If you have matched a sample to a song, then you'll certainly know the timestamp where the sample occurs within the song. There is a great paper by the guys behind Shazam that describes their technique: http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf They basically pick out the local maxima in the spectrogram and create a hash based on their relative positions.
Here is a good review on audio fingerprinting algorithms: http://mtg.upf.edu/files/publications/MMSP-2002-pcano.pdf
In any case, you'll likely be working a lot with FFT and spectrograms. This post talks about how to do that in Python.
I'd start by computing the FFT spectrogram of both the haystack and needle files (so to speak). Then you could try and (fuzzily) match the spectrograms - if you format them as images, you could even use off-the-shelf algorithms for that.
Not sure if that's the canonical or optimal way, but I feel like it should work.

Categories