Identify audio sample in a file - java

I want to be able to identify an audio sample (that is provided by the user) in a audio file I've got (mp3).
The mp3 file is a radio stream that I've kept for testing purposes, and I have the Pre-roll of the show. I want to identify it in the file and get the timestamp where it's playing in the file.
Note: The solution can be in any of the following programming languages: Java, Python or C++. I don't know how to analyze the video file and any reference about this subject will help.

This problem falls under the category of audio fingerprinting. If you have matched a sample to a song, then you'll certainly know the timestamp where the sample occurs within the song. There is a great paper by the guys behind Shazam that describes their technique: http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf They basically pick out the local maxima in the spectrogram and create a hash based on their relative positions.
Here is a good review on audio fingerprinting algorithms: http://mtg.upf.edu/files/publications/MMSP-2002-pcano.pdf
In any case, you'll likely be working a lot with FFT and spectrograms. This post talks about how to do that in Python.

I'd start by computing the FFT spectrogram of both the haystack and needle files (so to speak). Then you could try and (fuzzily) match the spectrograms - if you format them as images, you could even use off-the-shelf algorithms for that.
Not sure if that's the canonical or optimal way, but I feel like it should work.

Related

How do convert an audio clip into an array to perform an FFT?

I want to write a program that will read sound files of different instruments playing the same note, and show the different signature frequencies.
The library I am using to do so is JTransforms, since it seemed to be the one that was recommended the most to perform FFTs in java. I have not found any clear explanations on how to use this library, but from what I can gather, I need to pass in an array of real and complex numbers into the methods provided by the library. How do I get these numbers from my audio clips?
I have very basic knowledge of sound processing, since this is only my term project for my first computer science class.
As you can see in this or this question you need to read file as array of bytes. Don't worry about complex numbers, because you need only real part and can fill imaginary part with 0 (you can see it also in questions I linked above).

Multi channel audio within processing

I’m trying to build a sketch that shows me levels of audio coming into a system. I want to be able to do more than 2 channels so i know that i need more than the processing.sound library can provide currently and my searching has led me to javax.sound.sampled.*, however this is as far as my searching and playing has got me.
Does anyone know how to query the system for how many lines are coming in and to get the amplitude of audio on each line?
This is kind of a composite question.
For the number of lines, see Accessing Audio System Resources in the Java tutorials. There is sample code there for inspecting what lines are present. If some of the terms are confusing, most are defined in the tutorial immediately preceding this one.
To see what is on the line, check Capturing Audio.
To get levels, you will probably want to do some sort of rolling average (usually termed as root-mean-square). The "controls" (sometimes) provided at a higher level are kind of iffy for a variety of reasons.
In order to calculate those levels, though, you will have to convert the byte data to PCM. The example code in Using Files and Format Converters has example code that shows the point where the conversion would take place. In the first real example given, under the heading "Reading Sound Files" take note of the place where the comment sits that reads
// Here, do something useful with the audio data that's
// now in the audioBytes array...
I recall there are already StackOverflow questions that show the commands needed to convert bytes to PCM.

Identify sound clip in a wav file using Java

I am working on a personal project. Basically I have a collection of small sound clips, like a clap or a beep noise. I want to create a program that listens for the sounds via a mic or some form of audio input, and when I play sound clip it should identify that clip.
I have tried looking into this myself and have found this article.
http://www.redcode.nl/blog/2010/06/creating-shazam-in-java/
I tried replicating it, but I have found that it doesn't work as expected. I am guessing the sound clips I am using to create my hash from are too small to create enough values to compare.
Wondering if there any well know programs or algorithms that are capable of doing this.
Dan Ellis' slides are probably a good start. They explain the principal task of audio fingerprinting and the two best known approaches:
The Shazam algorithm by A. Wang (paper)
The Philips (now Gracenote) algorithm by Haitsma/Kalker (paper)
As you have already tried the landmark (Shazam) approach, perhaps it's worth your time to fiddle around with the stream-based approach. Since your queries are very short, you might also want to tweak the analysis frame length and overlap. Shorter frames and greater overlap may improve your results for very short samples. If you want to delve even deeper into the Haitsma/Kalker algorithm, you might also be interested in this unfortunately paywalled paper (by me).

Detect frequency of audio input - Java?

I've been researching this off-and-on for a few months.
I'm looking for a library or working example code to detect the frequency in sound card audio input, or detect presence of a given set of frequencies. I'm leaning towards Java, but the real requirement is that it should be something higher-level/simpler than C, and preferably cross-platform. Linux will be the target platform but I want to leave options open for Mac or possibly even Windows. Python would be acceptable too, and if anyone knows of a language that would make this easier/has better pre-written libraries, I'd be willing to consider it.
Essentially I have a defined set of frequency pairs that will appear in the soundcard audio input and I need to be able to detect this pair and then... do something, such as for example record the following audio up to a maximum duration, and then perform some action. A potential run could feature say 5-10 pairs, defined at runtime, can't be compiled in: something like frequency 1 for ~ 1 second, a maximum delay of ~1 second, frequency 2 for ~1 second.
I found suggestions of either doing an FFT or Goertzel algorithm, but was unable to find any more than the simplest example code that seemed to give no useful results. I also found some limitations with Java audio and not being able to sample at a high enough rate to get the resolution I need.
Any suggestions for libraries to use or maybe working code? I'll admit that I'm not the most mathematically inclined, so I've been lost in some of the more technical descriptions of how the algorithms actually work.
If you are aiming at detecting frequency pairs then your job is very similar to a DTMF detector.
Try searching for DTMF in places like sourgeforge, you'll find detectors in many programming languages. The frequency pairs placing along the spectrum seems to be even more stringent than your specs so you should be fine adapting a DTMF detector to your input.
Check out SNDPeek, its a cross-platform C++ application that extracts all kinds of information from live audio; https://github.com/RobQuistNL/sndpeek

Which data structure for linking text with audio in Java

I want to write a program in which plays an audio file that reads a text.
I want to highlite the current syllable that the audiofile plays in green and the rest of the current word in red.
What kind of datastructure should I use to store the audio file and the information that tells the program when to switch to the next word/syllable?
This is a slightly left-field suggestion, but have you looked at Karaoke software? It may not be seen as "serious" enough, but it sounds very similar to what you're doing. For example, Aegisub is a subtitling program that lets you create subtitles in the SSA/ASS format. It has karaoke tools for hilighting the chosen word or part.
It's most commonly used for subtitling anime, but it also works for audio provided you have a suitable player. These are sadly quite rare on the Mac.
The format looks similar to the one proposed by Yuval A:
{\K132}Unmei {\K34}no {\K54}tobira
{\K60}{\K132}yukkuri {\K36}to {\K142}hirakareta
The lengths are durations rather than absolute offsets. This makes it easier to shift the start of the line without recalculating all the offsets. The double entry indicates a pause.
Is there a good reason this needs to be part of your Java program, or is an off the shelf solution possible?
How about a simple data structure that describes what next batch of letters consists of the next syllable and the time stamp for switching to that syllable?
Just a quick example:
[0:00] This [0:02] is [0:05] an [0:07] ex- [0:08] am- [0:10] ple
To highlight part of word sounds like you're getting into phonetics which are sounds that make up words. It's going to be really difficult to turn a sound file into something that will "read" a text. Your best bet is to use the text itself to drive a phonetics based engine, like FreeTTS which is based off of the Java Speech API.
To do this you're going to have to take the text to be read, split it into each phonetic syllable and play it. so "syllable" is "syl" "la" "ble". Playing would be; highlight syl, say it and move to next one.
This is really "old-skool" its been done on the original Apple II the same way.
you might want to get familiar with FreeTTS -- this open source tool : http://freetts.sourceforge.net/docs/index.php -
You might want to feed only a few words to the TTS engine at a given point of time -- highlight them and once those are SPOKEN out, de-highlight them and move to the next batch of words.
BR,
~A

Categories