Identify sound clip in a wav file using Java - java

I am working on a personal project. Basically I have a collection of small sound clips, like a clap or a beep noise. I want to create a program that listens for the sounds via a mic or some form of audio input, and when I play sound clip it should identify that clip.
I have tried looking into this myself and have found this article.
http://www.redcode.nl/blog/2010/06/creating-shazam-in-java/
I tried replicating it, but I have found that it doesn't work as expected. I am guessing the sound clips I am using to create my hash from are too small to create enough values to compare.
Wondering if there any well know programs or algorithms that are capable of doing this.

Dan Ellis' slides are probably a good start. They explain the principal task of audio fingerprinting and the two best known approaches:
The Shazam algorithm by A. Wang (paper)
The Philips (now Gracenote) algorithm by Haitsma/Kalker (paper)
As you have already tried the landmark (Shazam) approach, perhaps it's worth your time to fiddle around with the stream-based approach. Since your queries are very short, you might also want to tweak the analysis frame length and overlap. Shorter frames and greater overlap may improve your results for very short samples. If you want to delve even deeper into the Haitsma/Kalker algorithm, you might also be interested in this unfortunately paywalled paper (by me).

Related

Procedural audio for rolling dice

I'm looking for suggestions on how to approach randomizing the audio around rolling a pair of D6 dice in my game. I'd like each roll of the dice to sound different but be reasonably plausible. And I don't want to make a bunch of pre-recorded dice rolls, I'd like more variation. Note that I do not need to synchronize the audio with animations (the animation I'm using is very simplistic and abstract).
My assumption is that I need a couple basic audio snippets for the sound of a single die hitting a surface once. (Any suggestions for generating those? Or is it best to capture them?) Then I would need some way to mutate and combine variations on that basic sound to create a unique roll sound.... Or am I just too ignorant to understand how complex that would be and I should try a different approach?
I'm developing in Java for Android, but tutorials or descriptions of how to reasonably combine or procedurally generate audio in any language would be appreciated. I don't need real-time support as I think I could just generate the next roll's audio in advance and cache it until the dice are actually rolled.
The dice could generate 2 types of sounds - one when hitting the environment; and one while hitting the other dice. Reading a spectrograph of a dice collision sound could give you the ratios of overtones. Its easy to prototype this in a software like Pure Data. The idea is that by varying the fundamental frequency slightly you should have a more procedural collision sound. You can probably use random numbers to do the variation and predict collisions. Its probably not the perfect sound, but it could be a start.
This is probably very vague, but I hope it still helps :D.
My solution would be to use free recording software to capture the sounds of one die rolling at a time. The right software would be able to chop those files into small samples containing each percussive hit and the residual audio. Just a few recorded dice rolls would give you dozens of samples of "hits".
Then, using Java, load up references to each sample, and design an algorithm that would play them back in a semi-random sequence with appropriate timing modifications. Then you'd have plenty of variety on the fly without the need to mix audio into a single stream before playback.
You can look over the procedural code for the "Shaker" class from the synthesis toolkit (STK). STK is a C++ library, but the procedure for actually creating the audio samples isn't too hard to pull out. There are lots of type of shakers offered by default. I expect you could poke in at the parametrization and make modifications if you wanted.

noise and silence clearing in java

I’m trying to develop an application that is capable of identifying a bird sound from a wav file recording. When creating the database im using another collection of sound clips and am trying to get a unique identification to them. Im planning to do this using FFT.(I don’t have any issues with these concepts) The question is, is it important to clear the noise of these base recording before creating the unique identification? If so, will anyone be able to help me with the concept of “Zero-crossing rate” and some other technique to clear the sound file for noise and silence.Thanks in advance.
In general, there is no way to remove noise unless you already have an accurate way of indentfying a temporal or spectral difference between the noise and the signal of interest. For instance, if you know the exact frequency bandwidth of the entire signal of interest, then you can use DSP to filter out the spectrum outside if that bandwidth. If you know the minimum amplitude of your signal of interest, then you can clip out everything below that level.

Detect frequency of audio input - Java?

I've been researching this off-and-on for a few months.
I'm looking for a library or working example code to detect the frequency in sound card audio input, or detect presence of a given set of frequencies. I'm leaning towards Java, but the real requirement is that it should be something higher-level/simpler than C, and preferably cross-platform. Linux will be the target platform but I want to leave options open for Mac or possibly even Windows. Python would be acceptable too, and if anyone knows of a language that would make this easier/has better pre-written libraries, I'd be willing to consider it.
Essentially I have a defined set of frequency pairs that will appear in the soundcard audio input and I need to be able to detect this pair and then... do something, such as for example record the following audio up to a maximum duration, and then perform some action. A potential run could feature say 5-10 pairs, defined at runtime, can't be compiled in: something like frequency 1 for ~ 1 second, a maximum delay of ~1 second, frequency 2 for ~1 second.
I found suggestions of either doing an FFT or Goertzel algorithm, but was unable to find any more than the simplest example code that seemed to give no useful results. I also found some limitations with Java audio and not being able to sample at a high enough rate to get the resolution I need.
Any suggestions for libraries to use or maybe working code? I'll admit that I'm not the most mathematically inclined, so I've been lost in some of the more technical descriptions of how the algorithms actually work.
If you are aiming at detecting frequency pairs then your job is very similar to a DTMF detector.
Try searching for DTMF in places like sourgeforge, you'll find detectors in many programming languages. The frequency pairs placing along the spectrum seems to be even more stringent than your specs so you should be fine adapting a DTMF detector to your input.
Check out SNDPeek, its a cross-platform C++ application that extracts all kinds of information from live audio; https://github.com/RobQuistNL/sndpeek

Audio analyzer for finding songs pitch

Is there anyway to analyze the audio pitches programmatically. For example, i know most of the players show a graph or bar & if the songs pitch is high # time t, the bar goes up at time t .. something like this. Is there any utility/tool/API to determine songs pitch so that we interpolate that to a bar which goes up & down.
Thanks for any help
Naive but robust: transform a modest length segment into Fourier space and find the peaks. Repeat as necessary.
Speed may be an issue, so choose the segment length as a power of 2 so that you can use the Fast Fourier Transform which is, well, fast.
Lots of related stuff on SO already. Try: https://stackoverflow.com/search?q=Fourier+transform
Well, unfortunately I'm not really an expert on audio with the iPhone, but I can point you towards a couple good resources.
Core Audio is probably going to be a big thing in what you want to do: htp://developer.apple.com/iphone/library/documentation/MusicAudio/Conceptual/CoreAudioOverview/Introduction/Introduction.html
As well, the Audio Toolbox may be of some help: htp://developer.apple.com/iphone/library/navigation/Frameworks/Media/AudioToolbox/index.html
If you are have a developer account, there are plenty of people on the forums that can help you: htps://devforums.apple.com/community/iphone
You'll have to add in a 't' in the http portion of those URLs, as I cannot post more than one hyperlink (sorry!).
To find the current pitch of a song, you need to learn about the Discrete Time Fourier Transform. To find the tempo, you need autocorrelation.
I think what you may be speaking of is a graphic equalizer, which displays the amplitude of different frequency ranges at a given time in an audio signal. It normally equipped with controls to modify the amplitudes between the given frequency ranges. Here's an example. Is that sort of what you're thinking of?
EDIT: Also, your numerous tags don't really give any indication of what language you might be using here, so I can't really suggest any specific techniques or libraries.

Sound Processing - Beat Matching Music Player on Android

So I want to make a new music player for Android, it's going to be open source and if you think this idea is any good feel free to let me know and maybe we can work on it.
I know it's possible to speed up and slow down a song and normalize the sound so that the voices and instruments still hit the same pitch.
I'd like to make a media play for Android aimed at joggers which will;
Beat match successive songs
Maintain a constant beat for running to
Beat can be established via accelerometer or manually
Alarms and notifications automatically at points in the run (Geo located or timer)
Now I know that this will fall down with many use cases (Slow songs sounding stupid, beat changes within song getting messed up) but I feel they can be overcome. What I really need to know is how to get started writing an application in C++ (Using the Android NDK) which will perform the analysis and adjust the stream.
Will it be feasible to do this on the fly? What approach would you use? A server that streams to the phone? Maybe offline analysis of the songs on a desktop that gets synched to your device via tether?
If this is too many questions for one post I am most interested in the easiest way of analysing the wave of an MP3 to find the beat. On top of that, how to perform the manipulation, to change the beat, would be my next point of interest.
I had a tiny crappy mp3 player that could do double speed on the fly so I'm sure it can be done!
Gav
This is technologically feasible on a smartphone-type device, although it is extremely difficult to achieve good-sounding pitch-shifting and time-stretching effects even on a powerful PC and not in realtime.
Pitch-shifting and time-stretching can be achieved on a relatively powerful mobile device in realtime (I've done it in .Net CF on a Samsung i760 smartphone) without overly taxing the processor (the simple version is not much more expensive than ordinary MP3 playback). The effect is not great, although it doesn't sound too bad if the pitch and time changes are relatively small.
Automatic determination of a song's tempo might be too time-consuming to do in real time, but this part of the process could be performed in advance of playback, or it could be done on the next song well before the current song is finished playing. I've never done this myself, so I dunno.
Everything else you mentioned is relatively easy to do. However: I don't know how easy Android's API is regarding audio output, or even whether it allows the low-level access to audio playback that this project would require.
Actually, you'll have 2 problems:
Finding the tempo of a song is not easy. The most common method involves autocorrolation, which involves quite a bit of calculus, so I hope you've studied up.
Actually changing the beat of a song without pitch shift is even harder, and still results in sound artifacts in the song. Typically it takes a long time to edit audio in this way, and it takes a lot of tinkering to get the song to sound good. To actually perform this in real time would be very, very hard. The actual process involves taking the Fourier Transform of the audio, shifting the frequency, and taking the inverse Fourier Transform. More calculus, this time with imaginary numbers.
If you really want to work on this I suggest taking a class in signals and systems from an Electrical Engineering department.
Perhaps an easier idea: Find the tempo of all the songs in a user's library, and just focus on playing songs with a close beat to the jogger's pace. You still need to do #1 but you don't need to worry about #2.
Changing the audio speed on the fly is definetly doable; I'm not sure if it's doable on the G1.
Rather than writing your own source I would recommend looking at the MythTV source and/or the mplayer source code. They both support speeding up video playback while compensating the audio.
http://picard.exceed.hu/tcpmp/test/
tcpmp did all that you asked for on an iddy biddy Palm Centro... And More, Including Video! If it can be done on a Palm Centro, it sure as heck can be done on the Android!!

Categories