Detect frequency of audio input - Java? - java

I've been researching this off-and-on for a few months.
I'm looking for a library or working example code to detect the frequency in sound card audio input, or detect presence of a given set of frequencies. I'm leaning towards Java, but the real requirement is that it should be something higher-level/simpler than C, and preferably cross-platform. Linux will be the target platform but I want to leave options open for Mac or possibly even Windows. Python would be acceptable too, and if anyone knows of a language that would make this easier/has better pre-written libraries, I'd be willing to consider it.
Essentially I have a defined set of frequency pairs that will appear in the soundcard audio input and I need to be able to detect this pair and then... do something, such as for example record the following audio up to a maximum duration, and then perform some action. A potential run could feature say 5-10 pairs, defined at runtime, can't be compiled in: something like frequency 1 for ~ 1 second, a maximum delay of ~1 second, frequency 2 for ~1 second.
I found suggestions of either doing an FFT or Goertzel algorithm, but was unable to find any more than the simplest example code that seemed to give no useful results. I also found some limitations with Java audio and not being able to sample at a high enough rate to get the resolution I need.
Any suggestions for libraries to use or maybe working code? I'll admit that I'm not the most mathematically inclined, so I've been lost in some of the more technical descriptions of how the algorithms actually work.

If you are aiming at detecting frequency pairs then your job is very similar to a DTMF detector.
Try searching for DTMF in places like sourgeforge, you'll find detectors in many programming languages. The frequency pairs placing along the spectrum seems to be even more stringent than your specs so you should be fine adapting a DTMF detector to your input.

Check out SNDPeek, its a cross-platform C++ application that extracts all kinds of information from live audio; https://github.com/RobQuistNL/sndpeek

Related

How to Read Text From Bounding Box using Java With OpenCV

I am working on Handwritten Form Recognition System, till now i have reached to this step where,i have been able to detect text using java with openCV but now i want to read the text from each of these bounding boxes Click to open image
I have being doing research to find out the process for the same using java with openCV but i was unable to find any.
Suggest me some links,Technologies,methods or process to perform this particular task with "JAVA".
This answer is more general than question specific. I will try to stick as much as possible with the problem statement.
Although there is a lot of on going research on recognition of hand written text, there is no full-proof method, which works with all possible problems.
The sample image you posted here is relatively noisy, with extremely high variance between the font of the same letter. This is exactly where it gets tricky.
I would personally suggest that once you have the bounding boxes around the text (which you already do), run contour extraction in all these bounding boxes in order to extract single letters. Once you have them, you need to figure out relevant feature/s that can represent the maximum variance (or at least 95% Confidence Interval) of the particular letter.
With this/ese feature/s, you need to train a supervised algorithm, letters as training data and their corresponding value (for eg. actual values) as labels. Once you have that, give it some data (the easiest and most difficult cases) to analyze the accuracy.
These links can help you for a start :
One of my first tools to check the accuracy with the set of features I use before I start coding: Weka
Go through basic tutorials on machine learning and how they work - Personal Favorite
You could try TensorFlow.
Simple Digit Recognition OCR in OpenCV-Python - Great for beginners.
Hope it helps!

Identify sound clip in a wav file using Java

I am working on a personal project. Basically I have a collection of small sound clips, like a clap or a beep noise. I want to create a program that listens for the sounds via a mic or some form of audio input, and when I play sound clip it should identify that clip.
I have tried looking into this myself and have found this article.
http://www.redcode.nl/blog/2010/06/creating-shazam-in-java/
I tried replicating it, but I have found that it doesn't work as expected. I am guessing the sound clips I am using to create my hash from are too small to create enough values to compare.
Wondering if there any well know programs or algorithms that are capable of doing this.
Dan Ellis' slides are probably a good start. They explain the principal task of audio fingerprinting and the two best known approaches:
The Shazam algorithm by A. Wang (paper)
The Philips (now Gracenote) algorithm by Haitsma/Kalker (paper)
As you have already tried the landmark (Shazam) approach, perhaps it's worth your time to fiddle around with the stream-based approach. Since your queries are very short, you might also want to tweak the analysis frame length and overlap. Shorter frames and greater overlap may improve your results for very short samples. If you want to delve even deeper into the Haitsma/Kalker algorithm, you might also be interested in this unfortunately paywalled paper (by me).

Detect frequency from microphone Java

I would like to make a program that can transfer data as pulses of a certain frequency but am unsure on how to detect if a frequency is present.
I would assume I need to filter out all unneeded frequencies but I can't seem to find anything on how to do this.
Are there any libraries that already do this or would I have to build my own? Are there any examples of this or something similar being done?
It looks as though you're trying to implement a Modem, and would be well advised to look at the proven modulation techniques used for this purpose - usually QPSK and QAM. The technique you imply in your question is a crude from of amplitude modulation - essentially modulating a carrier of a given frequency with a bit-stream. Heterodyning might be a good place to start when demodulating this. Using an FFT will yield poor results because of the sampling effect of windowing, which will result in a poor bandwidth.
Another practical problem you will face once you've demodulated the signal is clock recovery. It is highly probable that the original bitstream clock will be asynchronous with the sample clock at the receiver. In order to decode the data-stream, you will need to recover the sender's clock (that is to say, the relationship between it and a local clock). A Phased Lock Loop is the usual way of achieving this.
You will also need to work out how to detect the start of the bit-stream - e.g. some kind framing.
You'll want to apply a Fourier transform to the signal data to look it in the frequency domain. JTransforms is an open source library you can use to do that.

Analyzing Sound in a WAV file

I am trying to analyze a movie file by splitting it up into camera shots and then trying to determine which shots are more important than others. One of the factors I am considering in a shot's importance is how loud the volume is during that part of the movie. To do this, I am analyzing the corresponding sound file. I'm having trouble determining how "loud" a shot is because I don't think I fully understand what the data in a WAV file represents.
I read the file into an audio buffer using a method similar to that described in this post.
Having already split the corresponding video file into shots, I am now trying to find which shots are louder than others in the WAV file. I am trying to do this by extracting each sample in the file like this:
double amplitude = (double)((audioData[i] & 0xff) | (audioData[i + 1] << 8));
Some of the other posts I have read seem to indicate that I need to apply a Fast Fourier Transform to this audio data to get the amplitude, which makes me wonder what the values I have extracted actually represent. Is what I'm doing correct? My sound file format is a 16-bit mono PCM with a sampling rate of 22,050 Hz. Should I be doing something with this 22,050 value when I am trying to analyze the volume of the file? Other posts suggest using Root Mean Square to evaluate loudness. Is this required, or just a more accurate way of doing it?
The more I look into this the more confused I get. If anyone could shed some light on my mistakes and misunderstandings, I would greatly appreciate it!
The FFT has nothing to do with volume and everything to do with frequencies. To find out how loud a scene is on average, simply average the sampled values. Depending on whether you get the data as signed or unsigned values in your language, you might have to apply an absolute function first so that negative amplitudes don't cancel out the positive ones, but that's pretty much it. If you don't get the results you were expecting that must have to do with the way you are extracting the individual values in line 20.
That said, there are a few refinements that might or might not affect your task. Perceived loudness, amplitude and acoustic power are in fact related in non-linear ways, but as long as you are only trying to get a rough estimate of how much is "going on" in the audio signal I doubt that this is relevant for you. And of course, humans hear different frequencies better or worse - for instance, bats emit ultrasound squeals that would be absolutely deafening to us, but luckily we can't hear them at all. But again, I doubt this is relevant to your task, since e.g. frequencies above 22kHz (or was is 44kHz? not sure which) are in fact not representable in simple WAV format.
I don't know the level of accuracy you want, but a simple RMS (and perhaps simple filtering of the signal) is all many similar applications would need.
RMS will be much better than Peak amplitude. Using peak amplitudes is like determining the brightness of an image based on the brightest pixel, rather than averaging.
If you want to filter the signal or weigh it to perceived loudness, then you would need the sample rate for that.
FFT should not be required unless you want to do complex frequency analysis as well. The ear responds differently to frequencies at different amplitudes - the ear does not respond to sounds at different frequencies and amplitudes linearly. In this case, you could use FFT to perform frequency analyses for another domain of accuracy.

Audio analyzer for finding songs pitch

Is there anyway to analyze the audio pitches programmatically. For example, i know most of the players show a graph or bar & if the songs pitch is high # time t, the bar goes up at time t .. something like this. Is there any utility/tool/API to determine songs pitch so that we interpolate that to a bar which goes up & down.
Thanks for any help
Naive but robust: transform a modest length segment into Fourier space and find the peaks. Repeat as necessary.
Speed may be an issue, so choose the segment length as a power of 2 so that you can use the Fast Fourier Transform which is, well, fast.
Lots of related stuff on SO already. Try: https://stackoverflow.com/search?q=Fourier+transform
Well, unfortunately I'm not really an expert on audio with the iPhone, but I can point you towards a couple good resources.
Core Audio is probably going to be a big thing in what you want to do: htp://developer.apple.com/iphone/library/documentation/MusicAudio/Conceptual/CoreAudioOverview/Introduction/Introduction.html
As well, the Audio Toolbox may be of some help: htp://developer.apple.com/iphone/library/navigation/Frameworks/Media/AudioToolbox/index.html
If you are have a developer account, there are plenty of people on the forums that can help you: htps://devforums.apple.com/community/iphone
You'll have to add in a 't' in the http portion of those URLs, as I cannot post more than one hyperlink (sorry!).
To find the current pitch of a song, you need to learn about the Discrete Time Fourier Transform. To find the tempo, you need autocorrelation.
I think what you may be speaking of is a graphic equalizer, which displays the amplitude of different frequency ranges at a given time in an audio signal. It normally equipped with controls to modify the amplitudes between the given frequency ranges. Here's an example. Is that sort of what you're thinking of?
EDIT: Also, your numerous tags don't really give any indication of what language you might be using here, so I can't really suggest any specific techniques or libraries.

Categories