I want to open an AMR file so I can perform signal processing algorithms on the contents (ex: what is the pitch?). I know you can open these files in a media player, but I want to get the actual contents of the file.
At one point I printed the contents and got a bunch of integers, but have no idea what they mean.
Any help is greatly appreciated. Thanks!
It sounds like you are able to get at the data, but don't know very much at all about the basics of audio signal processing.
The data you are looking at is probably raw bytes that need to be translated into PCM (Pulse Code Modulation). The Java Overview of the Sampled Package talks a bit about the relationship of the bytes to PCM as determined by a specific format.
For example, if the format specifies 16-bit encoding, then two bytes (each being 8 bits) will be concatenated to form a single PCM value that will range from -32767 to 32767. (Some people work directly with these numbers, others scale the numbers to floats ranging from -1 to 1).
And if the file is 44100 fps, then there will be 44100 "frames" of data per second, where the frame will most likely be mono or stereo (one PCM or two PCM values per frame)
The tutorial does get into Java specifics pretty quickly, but at least it gives a basic picture and you will have more terms to use in a search for something more specific to Android.
If you want to go into greater depth or detail, you could consult Steve Smith's The Scientist and Engineer's Guide to Digital Signal Processing. It is a free online book that I've found to be extremely helpful.
Related
I have a digital stethoscope with me I can record the Human heart sound easily with an Android phone and it is clear, I can listen lub-dub(s1-S2) clearly from the recorded file.
I want to calculate the Heart rate of recorded audio, is there any way to calculate BPM from audio file?
I have written application for Android in Kotlin and some parts in java.
Thanks in advance
First, obtain the stream of PCM values for the data (for example floats ranging from -1 to 1 or maybe shorts from -32768 to 32767 if your data is 16-bit). I'm assuming signed PCM.
Second, apply an RMS (root-mean-square) function to the data to get the relative power of the volume over the course of the data, and look for the peaks. I'm assuming that each "thump" will be a point of relative loudness and that the audio between "thumps" will have less volume.
Third, count the number of frames between the peaks. Using your sample rate, you can derive a time value from that.
IDK the specifics of how Android/Kotlin systems provide access to the PCM. Most likely it will be in the form of a byte stream that is encoded according to the audio format. For example, mono, 44100 fps, 16-bits, little-endian. Using Java, a TargetDataLine might be involved.
SO has questions that explain how to convert the bytes to PCM.
SO also has questions about how to apply an RMS function to the PCM. There is some aggregation involved, as it involves calculating a moving average.
I don't know if frequency analysis tool would be helpful. The frequency of the heartbeat is very low, like 1 or 2 per second. We can't even hear frequencies below 20 per second. But there ARE likely tools that work in that low a frequency range.
I am making a Java personal project where you can record yourself singing a song, and the program will load a song (from a preselected small selection) that best matches that melody. So far, I have implemented the ability for the user to record an audio file as a WAVE file using the Java Sound API. I have seen that for audio similarity, one can perform correlation between the audio files, and by measuring if there is a high magnitude peak in the correlation graph one can determine if the audio files are similar.
I read the following post in the Signal Processing stack exchange
https://dsp.stackexchange.com/questions/736/how-do-i-implement-cross-correlation-to-prove-two-audio-files-are-similar which talks about using the Fast Fourier transform to accomplish convolution (correlation that works for time-delayed audio). I have imported the JTransforms project on Github to use FFT, but I am unsure how to turn the WAVE files into a numerical representation (something like a large array of values) that I can use to perform correlation or convolution. Any advice on how to go about this is much appreciated!
To read a .wav, you will be using the class AudioInputStream. An example is provided in the tutorial "Using Files and Format Converters It's the first code example in the article, in the sections "Reading Sound Files".
The next hurdle is translating the bytes into meaningful PCM. In the code example above, there is a comment line that reads:
// Here, do something useful with the audio data that's
// now in the audioBytes array...
That is the point where you can convert the bytes to PCM. The exact algorithm depends on the format which you can inspect via AudioInputStream's getFormat method, which returns an AudioFormat.
The format will tell you how many bytes per PCM value (e.g., 16-bit encoding is two bytes per PCM value) and the byte order, which can be little- or big-endian. If the audio is stereo, the PCM values alternate between left and right.
The building of the PCM values from the bytes involves bit shifting. I'm guessing you know how to handle this. The natural result of creating 16-bit values, assuming the data is signed PCM format, would be signed short integers. So, the last step is often division by the Short.MAX_VALUE to convert the shorts to signed floats ranging from -1 to 1.
I'm making program for Active Noise Control(also use Adaptive instead of Active / use Cancellation instead of Control)
System is pretty simple.
get sound via mic
turn the sound into data, which I can read(Something like Integer array)
make antiphase of the sound.
turn the data into sound file
Follwing is my question
Can I read sound as Integer Array?
If I can use Integer Array, how can I make antiphase? Just multiply -1 to every data?
Any useful think about my project
Is there any recommended language rather than java?
I heard that stackoverflow have many top class programmers. So, I expect for critical answer :D
Answering your questions:
(1) When you read sound, a byte array is returned. The bytes can readily be decoded into integers, shorts, floats, whatever. Java supports many common formats, and probably has one that matches your microphone input and speaker output. For example, Java supports 16-bit encoding, stereo, 44100 fps, which is considered the standard for CD-quality. There are several questions already at StackOverflow that show the coding for the decoding and recoding back to bytes.
(2) Yes, just multiply by -1 to every element of your PCM array. When you add the negative to the correctly lined up counterpart, 0 will result.
(3 & 4) I don't know what the tolerances are for lag time! I think if you simply take the input, decode, multiply by -1, recode, and output, it might be possible to get a very small amount of processing time. I don't know what Java is capable of here, but I bet it will be on the scale of a dozen millis, give or take. How much is enough for cancellation? How far does the sound travel from mike to speaker location? How much time does that allow? (Or am I missing something about how this works? I haven't done this sort of thing before.)
Java is pretty darn fast, and you will be relatively close to the native code level with the reading and writing and simple numeric conversions. The core code (for testing) could probably be written in an afternoon, using the following tutorial examples as a template: Reading/Writing sound files, see code snippets. I'd pay particular attention to the spot where the comment reads "Here do something useful with the audio data that is in the bytes array..." At this point,
you would put the code to convert the bytes to DSP, multiply by -1, then convert back to bytes.
If Java doesn't prove fast enough, I assume the next thing to try would be some flavor of C.
I need to break apart a large collection of wav files into smaller segments, and convert them into 16 khz, 16-bit mono wav files. To segment the wav files, I downloaded a WavFile class from the following site: WavFile Class. I tweaked it a bit to allow skipping an arbitrary number of frames. Using that class, I created a WavSegmenter class that would read a source wav file, and copy the frames between time x and time y into a new wav file. The start time and end time I can get from a provided XML file, and I can get the frames using sample rate * time. My problem is I do not know how to convert the sample rates from 44,100 to 16,000.
Currently, I am looking into Java's Sound API for this. I didn't consult it initially, because I found the guides long, but if it's the best existing option, I am willing to go through it. I would still like to know if there's another way to do it, though. Finally, I would like to know whether I should completely adapt Java's Sound API, and drop the WavFile class I am currently using. To me, it looks sound, but I would just like to be sure.
Thank you very much, in advance, for your time.
I believe the hardest part of your task is re-sampling from 44.1K to 16K samples per sec. It would have been much simpler to downsample to 22K or 11K from there! You will need to do some interpolation there.
EDIT: After further review and discussion with OP I believe the right choice for this situation is to go with Java Sound API because it provides methods for conversion between different sound file formats, including different sampling rates. Sticking with the WavFile API would require re-sampling which is quite complicated to implement in a 44.1K to 16K conversion case.
http://www.jsresources.org/examples/SampleRateConverter.html. I suppose This would help you...
I have to make an application that stream a playlist in the ITU-R BT.656 video format. Frankly speaking it's my first time hearing about this format. It seems to be used for DV streaming. I have been googling around for hours and so far the only BT656 encoders i found are hardware one. Is it possible to do it programmatically ?
A BT.656 data stream is a sequence of 8-bit or 10-bit words, transmitted at a rate of 27 Mbyte/s.
Essentially this is a digitized raw video. While conceptually, streaming this might not be impossible, you cannot really afford to stream at 27 MBps. If indeed that is what you need to do, converting each frame into a BMP and sending them one after the other would be equivalent. If you are getting this from some raw video source, it is best to encode with reasonable standard MPEG format and bit rate and stream it.
Ref: http://en.wikipedia.org/wiki/ITU-R_BT.656