Is there anyway to analyze the audio pitches programmatically. For example, i know most of the players show a graph or bar & if the songs pitch is high # time t, the bar goes up at time t .. something like this. Is there any utility/tool/API to determine songs pitch so that we interpolate that to a bar which goes up & down.
Thanks for any help
Naive but robust: transform a modest length segment into Fourier space and find the peaks. Repeat as necessary.
Speed may be an issue, so choose the segment length as a power of 2 so that you can use the Fast Fourier Transform which is, well, fast.
Lots of related stuff on SO already. Try: https://stackoverflow.com/search?q=Fourier+transform
Well, unfortunately I'm not really an expert on audio with the iPhone, but I can point you towards a couple good resources.
Core Audio is probably going to be a big thing in what you want to do: htp://developer.apple.com/iphone/library/documentation/MusicAudio/Conceptual/CoreAudioOverview/Introduction/Introduction.html
As well, the Audio Toolbox may be of some help: htp://developer.apple.com/iphone/library/navigation/Frameworks/Media/AudioToolbox/index.html
If you are have a developer account, there are plenty of people on the forums that can help you: htps://devforums.apple.com/community/iphone
You'll have to add in a 't' in the http portion of those URLs, as I cannot post more than one hyperlink (sorry!).
To find the current pitch of a song, you need to learn about the Discrete Time Fourier Transform. To find the tempo, you need autocorrelation.
I think what you may be speaking of is a graphic equalizer, which displays the amplitude of different frequency ranges at a given time in an audio signal. It normally equipped with controls to modify the amplitudes between the given frequency ranges. Here's an example. Is that sort of what you're thinking of?
EDIT: Also, your numerous tags don't really give any indication of what language you might be using here, so I can't really suggest any specific techniques or libraries.
Related
I am working on a personal project. Basically I have a collection of small sound clips, like a clap or a beep noise. I want to create a program that listens for the sounds via a mic or some form of audio input, and when I play sound clip it should identify that clip.
I have tried looking into this myself and have found this article.
http://www.redcode.nl/blog/2010/06/creating-shazam-in-java/
I tried replicating it, but I have found that it doesn't work as expected. I am guessing the sound clips I am using to create my hash from are too small to create enough values to compare.
Wondering if there any well know programs or algorithms that are capable of doing this.
Dan Ellis' slides are probably a good start. They explain the principal task of audio fingerprinting and the two best known approaches:
The Shazam algorithm by A. Wang (paper)
The Philips (now Gracenote) algorithm by Haitsma/Kalker (paper)
As you have already tried the landmark (Shazam) approach, perhaps it's worth your time to fiddle around with the stream-based approach. Since your queries are very short, you might also want to tweak the analysis frame length and overlap. Shorter frames and greater overlap may improve your results for very short samples. If you want to delve even deeper into the Haitsma/Kalker algorithm, you might also be interested in this unfortunately paywalled paper (by me).
I am trying to make a music visualizer in Processing, not that that part is super important, and I'm using a fast fourier transform through Minim. It's working perfectly (reading the data), but there is a large spike on the left (bass) end. What's the best way to 'level' this out?
My source code is here, if you want to take a look.
Thanks in advance,
-tlf
The spectrum you show looks fairly typical of a complex musical sound where you have a complex section at lower frequencies, but also some clear harmonics emerging from the low frequency mess. And, actually, these harmonics are atypically clear... music in general is complicated. Sometimes, for example, if a flute is playing a single clear note one will get a single nice peak or two, but it's much more common that transients and percussive sounds lead to a very complicated spectrum, especially at low frequencies.
For comparing directly to the video, it seems to me that the video is a bit odd. My guess is that the spectrum they show is either a zoom in a small section of the spectrum far from zero, or that it's just a graphical algorithm that's based off the music but doesn't correspond to an actual spectrum. That is, if you really want something to look very similar to this video, you'll need more than the spectrum, though the spectrum will likely be a good starting point. Here are a few points to note:
1) there is a prominent peak which occasionally appears right above the "N" in the word anchor. A single dominant peak should be clear in the audio as an approximately pure tone.
2) occasionally there's another peak that varies temporally with this peak, which would normally be a sign that the second peak is a harmonic, but many times this second peak isn't there.
3) A good examples of odd behavior, is a 2:26. This time just follows a little laser sound effect, and then there's basically a quite hiss. A hiss should be a broad spectrum sound without peaks, often weighted to lower frequencies. At 2:26, though, there's just this single large peak above the "N" with nothing else.
It turns out what I had to do was multiply the data by
Math.log(i + 2) / 3
where i is the index of the data being referenced, zero-indexed from the left (bass).
You can see this in context here
I'm looking for suggestions on how to approach randomizing the audio around rolling a pair of D6 dice in my game. I'd like each roll of the dice to sound different but be reasonably plausible. And I don't want to make a bunch of pre-recorded dice rolls, I'd like more variation. Note that I do not need to synchronize the audio with animations (the animation I'm using is very simplistic and abstract).
My assumption is that I need a couple basic audio snippets for the sound of a single die hitting a surface once. (Any suggestions for generating those? Or is it best to capture them?) Then I would need some way to mutate and combine variations on that basic sound to create a unique roll sound.... Or am I just too ignorant to understand how complex that would be and I should try a different approach?
I'm developing in Java for Android, but tutorials or descriptions of how to reasonably combine or procedurally generate audio in any language would be appreciated. I don't need real-time support as I think I could just generate the next roll's audio in advance and cache it until the dice are actually rolled.
The dice could generate 2 types of sounds - one when hitting the environment; and one while hitting the other dice. Reading a spectrograph of a dice collision sound could give you the ratios of overtones. Its easy to prototype this in a software like Pure Data. The idea is that by varying the fundamental frequency slightly you should have a more procedural collision sound. You can probably use random numbers to do the variation and predict collisions. Its probably not the perfect sound, but it could be a start.
This is probably very vague, but I hope it still helps :D.
My solution would be to use free recording software to capture the sounds of one die rolling at a time. The right software would be able to chop those files into small samples containing each percussive hit and the residual audio. Just a few recorded dice rolls would give you dozens of samples of "hits".
Then, using Java, load up references to each sample, and design an algorithm that would play them back in a semi-random sequence with appropriate timing modifications. Then you'd have plenty of variety on the fly without the need to mix audio into a single stream before playback.
You can look over the procedural code for the "Shaker" class from the synthesis toolkit (STK). STK is a C++ library, but the procedure for actually creating the audio samples isn't too hard to pull out. There are lots of type of shakers offered by default. I expect you could poke in at the parametrization and make modifications if you wanted.
I am trying to analyze a movie file by splitting it up into camera shots and then trying to determine which shots are more important than others. One of the factors I am considering in a shot's importance is how loud the volume is during that part of the movie. To do this, I am analyzing the corresponding sound file. I'm having trouble determining how "loud" a shot is because I don't think I fully understand what the data in a WAV file represents.
I read the file into an audio buffer using a method similar to that described in this post.
Having already split the corresponding video file into shots, I am now trying to find which shots are louder than others in the WAV file. I am trying to do this by extracting each sample in the file like this:
double amplitude = (double)((audioData[i] & 0xff) | (audioData[i + 1] << 8));
Some of the other posts I have read seem to indicate that I need to apply a Fast Fourier Transform to this audio data to get the amplitude, which makes me wonder what the values I have extracted actually represent. Is what I'm doing correct? My sound file format is a 16-bit mono PCM with a sampling rate of 22,050 Hz. Should I be doing something with this 22,050 value when I am trying to analyze the volume of the file? Other posts suggest using Root Mean Square to evaluate loudness. Is this required, or just a more accurate way of doing it?
The more I look into this the more confused I get. If anyone could shed some light on my mistakes and misunderstandings, I would greatly appreciate it!
The FFT has nothing to do with volume and everything to do with frequencies. To find out how loud a scene is on average, simply average the sampled values. Depending on whether you get the data as signed or unsigned values in your language, you might have to apply an absolute function first so that negative amplitudes don't cancel out the positive ones, but that's pretty much it. If you don't get the results you were expecting that must have to do with the way you are extracting the individual values in line 20.
That said, there are a few refinements that might or might not affect your task. Perceived loudness, amplitude and acoustic power are in fact related in non-linear ways, but as long as you are only trying to get a rough estimate of how much is "going on" in the audio signal I doubt that this is relevant for you. And of course, humans hear different frequencies better or worse - for instance, bats emit ultrasound squeals that would be absolutely deafening to us, but luckily we can't hear them at all. But again, I doubt this is relevant to your task, since e.g. frequencies above 22kHz (or was is 44kHz? not sure which) are in fact not representable in simple WAV format.
I don't know the level of accuracy you want, but a simple RMS (and perhaps simple filtering of the signal) is all many similar applications would need.
RMS will be much better than Peak amplitude. Using peak amplitudes is like determining the brightness of an image based on the brightest pixel, rather than averaging.
If you want to filter the signal or weigh it to perceived loudness, then you would need the sample rate for that.
FFT should not be required unless you want to do complex frequency analysis as well. The ear responds differently to frequencies at different amplitudes - the ear does not respond to sounds at different frequencies and amplitudes linearly. In this case, you could use FFT to perform frequency analyses for another domain of accuracy.
I've been researching this off-and-on for a few months.
I'm looking for a library or working example code to detect the frequency in sound card audio input, or detect presence of a given set of frequencies. I'm leaning towards Java, but the real requirement is that it should be something higher-level/simpler than C, and preferably cross-platform. Linux will be the target platform but I want to leave options open for Mac or possibly even Windows. Python would be acceptable too, and if anyone knows of a language that would make this easier/has better pre-written libraries, I'd be willing to consider it.
Essentially I have a defined set of frequency pairs that will appear in the soundcard audio input and I need to be able to detect this pair and then... do something, such as for example record the following audio up to a maximum duration, and then perform some action. A potential run could feature say 5-10 pairs, defined at runtime, can't be compiled in: something like frequency 1 for ~ 1 second, a maximum delay of ~1 second, frequency 2 for ~1 second.
I found suggestions of either doing an FFT or Goertzel algorithm, but was unable to find any more than the simplest example code that seemed to give no useful results. I also found some limitations with Java audio and not being able to sample at a high enough rate to get the resolution I need.
Any suggestions for libraries to use or maybe working code? I'll admit that I'm not the most mathematically inclined, so I've been lost in some of the more technical descriptions of how the algorithms actually work.
If you are aiming at detecting frequency pairs then your job is very similar to a DTMF detector.
Try searching for DTMF in places like sourgeforge, you'll find detectors in many programming languages. The frequency pairs placing along the spectrum seems to be even more stringent than your specs so you should be fine adapting a DTMF detector to your input.
Check out SNDPeek, its a cross-platform C++ application that extracts all kinds of information from live audio; https://github.com/RobQuistNL/sndpeek