I have a digital stethoscope with me I can record the Human heart sound easily with an Android phone and it is clear, I can listen lub-dub(s1-S2) clearly from the recorded file.
I want to calculate the Heart rate of recorded audio, is there any way to calculate BPM from audio file?
I have written application for Android in Kotlin and some parts in java.
Thanks in advance
First, obtain the stream of PCM values for the data (for example floats ranging from -1 to 1 or maybe shorts from -32768 to 32767 if your data is 16-bit). I'm assuming signed PCM.
Second, apply an RMS (root-mean-square) function to the data to get the relative power of the volume over the course of the data, and look for the peaks. I'm assuming that each "thump" will be a point of relative loudness and that the audio between "thumps" will have less volume.
Third, count the number of frames between the peaks. Using your sample rate, you can derive a time value from that.
IDK the specifics of how Android/Kotlin systems provide access to the PCM. Most likely it will be in the form of a byte stream that is encoded according to the audio format. For example, mono, 44100 fps, 16-bits, little-endian. Using Java, a TargetDataLine might be involved.
SO has questions that explain how to convert the bytes to PCM.
SO also has questions about how to apply an RMS function to the PCM. There is some aggregation involved, as it involves calculating a moving average.
I don't know if frequency analysis tool would be helpful. The frequency of the heartbeat is very low, like 1 or 2 per second. We can't even hear frequencies below 20 per second. But there ARE likely tools that work in that low a frequency range.
Related
I am making a Java personal project where you can record yourself singing a song, and the program will load a song (from a preselected small selection) that best matches that melody. So far, I have implemented the ability for the user to record an audio file as a WAVE file using the Java Sound API. I have seen that for audio similarity, one can perform correlation between the audio files, and by measuring if there is a high magnitude peak in the correlation graph one can determine if the audio files are similar.
I read the following post in the Signal Processing stack exchange
https://dsp.stackexchange.com/questions/736/how-do-i-implement-cross-correlation-to-prove-two-audio-files-are-similar which talks about using the Fast Fourier transform to accomplish convolution (correlation that works for time-delayed audio). I have imported the JTransforms project on Github to use FFT, but I am unsure how to turn the WAVE files into a numerical representation (something like a large array of values) that I can use to perform correlation or convolution. Any advice on how to go about this is much appreciated!
To read a .wav, you will be using the class AudioInputStream. An example is provided in the tutorial "Using Files and Format Converters It's the first code example in the article, in the sections "Reading Sound Files".
The next hurdle is translating the bytes into meaningful PCM. In the code example above, there is a comment line that reads:
// Here, do something useful with the audio data that's
// now in the audioBytes array...
That is the point where you can convert the bytes to PCM. The exact algorithm depends on the format which you can inspect via AudioInputStream's getFormat method, which returns an AudioFormat.
The format will tell you how many bytes per PCM value (e.g., 16-bit encoding is two bytes per PCM value) and the byte order, which can be little- or big-endian. If the audio is stereo, the PCM values alternate between left and right.
The building of the PCM values from the bytes involves bit shifting. I'm guessing you know how to handle this. The natural result of creating 16-bit values, assuming the data is signed PCM format, would be signed short integers. So, the last step is often division by the Short.MAX_VALUE to convert the shorts to signed floats ranging from -1 to 1.
I want to open an AMR file so I can perform signal processing algorithms on the contents (ex: what is the pitch?). I know you can open these files in a media player, but I want to get the actual contents of the file.
At one point I printed the contents and got a bunch of integers, but have no idea what they mean.
Any help is greatly appreciated. Thanks!
It sounds like you are able to get at the data, but don't know very much at all about the basics of audio signal processing.
The data you are looking at is probably raw bytes that need to be translated into PCM (Pulse Code Modulation). The Java Overview of the Sampled Package talks a bit about the relationship of the bytes to PCM as determined by a specific format.
For example, if the format specifies 16-bit encoding, then two bytes (each being 8 bits) will be concatenated to form a single PCM value that will range from -32767 to 32767. (Some people work directly with these numbers, others scale the numbers to floats ranging from -1 to 1).
And if the file is 44100 fps, then there will be 44100 "frames" of data per second, where the frame will most likely be mono or stereo (one PCM or two PCM values per frame)
The tutorial does get into Java specifics pretty quickly, but at least it gives a basic picture and you will have more terms to use in a search for something more specific to Android.
If you want to go into greater depth or detail, you could consult Steve Smith's The Scientist and Engineer's Guide to Digital Signal Processing. It is a free online book that I've found to be extremely helpful.
I need to make a function in Java that adds a note of a certain frequency and length to audio. I have the frequency as a double, and the note length in milliseconds.
Full Function Description:
This method takes one AudioInputStream and adds the sound of a certain pure frequency to the FRONT of it, lasting a certain length of time.
So the method takes two additional parameters: frequency and noteLengthInMilliseconds. The amplitude of the additional sound should be 64*256.
However, I need to find the note's length in Frames, as well as in Bytes and Ints.
Also, any advice on how to create the note as a data array of samples (am using java.sound.sampled package) would be helpful.
FRAME & SAMPLE RATE: 44100.0hz
ENCODING: PCM_SIGNED
Since your frame rate and sample rate are identical, let's assume you're referring to PCM frames. A PCM frame is one sample per channel. If you only have one audio channel, at 16 bits per sample, you get one frame every two bytes. If you have stereo audio at 16 bits per sample, you get one frame every four bytes.
To figure out the length, take the sample rate and divide it out. If I have a sample rate of 44.1 kHz and I want a note to last for a half second:
44,100 * 0.5 = 22,050 samples (22,050 frames)
From there if you know that the audio is 16 bits, and there is only one channel:
22,050 * 2 (bytes per channel) * 1 (channels) = 44,100 bytes
The length in frames is a simple computation that relies on information in the AudioFormat. Specifically, inspect the property "frameRate". The number of bytes per frame is covered by the frameSize property.
To make your array of audio data involves a couple steps. The values of a pure tone can be created by running a Math.sin function. Math.PI * 2 is equivalent to one full cycle (I do hope you already know this--this is very elementary), so use division to figure out how small an increment for each step.
The result of a Math.sin function is a double between -1 and 1. You will have to convert this number to a data range that fits the assignment. This is done by multiplying the audio value by a factor. For example, multiplying by 32767 would result in a range that fills a Short, or 16 bits, signed.
The conversion from the Short to bytes depends in part if you are big-endian or little-endian. Also you have to manage whether there are multiple tracks or not (e.g., stereo is common). All these steps have been covered in other Stackoverflow posts.
What you do with the data array once you have it, how it is combined with the existing audio, depends in part on whether you are outputting as audio or trying to create a new wav file to export. There are sections of the Java Tutorial's Audio Trail that cover playback, saving files, as well as a good section on file formats and format conversions (highly recommended you read this).
I'm making program for Active Noise Control(also use Adaptive instead of Active / use Cancellation instead of Control)
System is pretty simple.
get sound via mic
turn the sound into data, which I can read(Something like Integer array)
make antiphase of the sound.
turn the data into sound file
Follwing is my question
Can I read sound as Integer Array?
If I can use Integer Array, how can I make antiphase? Just multiply -1 to every data?
Any useful think about my project
Is there any recommended language rather than java?
I heard that stackoverflow have many top class programmers. So, I expect for critical answer :D
Answering your questions:
(1) When you read sound, a byte array is returned. The bytes can readily be decoded into integers, shorts, floats, whatever. Java supports many common formats, and probably has one that matches your microphone input and speaker output. For example, Java supports 16-bit encoding, stereo, 44100 fps, which is considered the standard for CD-quality. There are several questions already at StackOverflow that show the coding for the decoding and recoding back to bytes.
(2) Yes, just multiply by -1 to every element of your PCM array. When you add the negative to the correctly lined up counterpart, 0 will result.
(3 & 4) I don't know what the tolerances are for lag time! I think if you simply take the input, decode, multiply by -1, recode, and output, it might be possible to get a very small amount of processing time. I don't know what Java is capable of here, but I bet it will be on the scale of a dozen millis, give or take. How much is enough for cancellation? How far does the sound travel from mike to speaker location? How much time does that allow? (Or am I missing something about how this works? I haven't done this sort of thing before.)
Java is pretty darn fast, and you will be relatively close to the native code level with the reading and writing and simple numeric conversions. The core code (for testing) could probably be written in an afternoon, using the following tutorial examples as a template: Reading/Writing sound files, see code snippets. I'd pay particular attention to the spot where the comment reads "Here do something useful with the audio data that is in the bytes array..." At this point,
you would put the code to convert the bytes to DSP, multiply by -1, then convert back to bytes.
If Java doesn't prove fast enough, I assume the next thing to try would be some flavor of C.
I have to use FFT to analyse the frequency of an audio file. But I don't know what the input and output is.
Do I have to use 1-dimension, 2-dimension or 3-dimension array if I want to draw the spectrum's audio file? And can someone suggest me library for FFT on J2ME?
#thongcaoloi,
The simple answer regarding the dimensionality of your input data is: you need 1D data. Now I'll explain what that means.
Because you want to analyze audio data, your input to the discrete Fourier transform (DFT or FFT), is a 1-dimensional sequence of real numbers, which represents the changing voltage of the audio signal over time, and your audio file is a digital representation of that changing voltage over time.
Your audio file was produced by sampling the voltage of a continuous audio signal at a fixed sampling rate (also known as the sampling frequency), typically 44.1 KHz for CD quality audio.
But your data file could have been sampled at a much lower frequency, so try to find out the sampling frequency of your data before you do an FFT on that data.
So now you have to extract the individual samples from your audio file. If your file is stereo, it will have two separate sample sequences, one for the right channel and one for the left channel. If the file is mono, it will have only one sample sequence.
If your file is stereo, or any other multi-channel audio format such as 5.1 or 7.1, you could FFT each channel separately, or you could combine any number of channels together using voltage addition. That's up to you, and depends on what you're trying to do with your FFT results.
The output of the DFT or FFT is a sequence of complex numbers. Each complex number is a pair consisting of a real-part and an imaginary-part, typically shown as a pair (re,im).
If you want to graph the power spectral density of your audio file, which is what most people want from the FFT, you'll graph 20*log10( sqrt( re^2 + im^2 ) ), using the first N/2 complex numbers of the FFT output, where N is the number of input samples to the FFT.
You can try to build your own spectrum analyzer software program, but I suggest using something that's already built and tested.
These two FFT spectrum analyzers give results instantly, and have built-in IFFT synthesis, meaning that you can inverse Fourier transform the frequency-domain spectral data to reconstruct the original signal in the time-domain.
http://www.mathworks.com/help/techdoc/ref/fft.html
http://www.sooeet.com/math/fft.php
There's a lot more to this topic, and to the subject of digital signal processing in general, but this brief introduction, should get you started.
In the theoretical sense, an FFT maps complex[N] => complex[N]. However, if your data is just an audio file, then your input will be simply complex numbers with no imaginary component. Thus you will map real[N] =>complex[N]. However, with a little math, you see that the format of the output will always be output[i]==complex_conjugate(output[N-i]). Thus you really only need to look at the first N/2+1 samples. Additionally, the complex output of the FFT gives you information about both phase and magnitude. If all you care about is how much of a certain frequency is in your audio, you only need to look at the magnitude, which can be calculated as square_root(imaginary^2+real^2), for each element of the output.
Of course, you'll need to look at the documentation of whatever library you use to understand which array element corresponds to the real part of the Nth complex output, and likewise to find the imaginary part of the Nth complex output.
As I remember FFT algorithm is not that complex, I used to write a Class of FFT calculation for my thesis. At that time the input is a 1D array of values which are read from the *.WAV files. But before FFT, there were some filtering and normalization performed.