I am using following code to normalize PCM audio data, Is this the correct way to normalize? After Normalization I am applying LPF. Does the order matters whether to do LPF first and Normalization on its output or my current order is better only if that matters. Also my targetMax is set to 8000 which I used from on of this forum's posting. What is the optimal value for it. My input is 16 bit MONO PCM with sample rate of 44100.
private static int findMaxAmplitude(short[] buffer) {
short max = Short.MIN_VALUE;
for (int i = 0; i < buffer.length; ++i) {
short value = buffer[i];
max = (short) Math.max(max, value);
}
return max;
}
short[] process(short[] buffer) {
short[] output = new short[buffer.length];
int maxAmplitude = findMaxAmplitude(buffer);
for (int index = 0; index < buffer.length; index++) {
output[index] = normalization(buffer[index], maxAmplitude);
}
return output;
}
private short normalization(short value, int rawMax) {
short targetMax = 8000;
double maxReduce = 1 - targetMax / (double) rawMax;
int abs = Math.abs(value);
double factor = (maxReduce * abs / (double) rawMax);
return (short) Math.round((1 - factor) * value);
}
Your findMaxAmplitude only looks at the positive excursions. It should use something like
max = (short)Math.Max(max, Math.Abs(value));
Your normalization seems quite involved. A simpler version would use:
return (short)Math.Round(value * targetMax / rawMax);
Whether a targetMax of 8000 is correct is a matter of taste. Normally I would expect normalisation of 16-bit samples to use the maximum range of values. So a targetMax of 32767 seems more logical.
The normalization should probably be done after the LPF operation, as the gain of the LPF may change the maximum value of your sequence.
Related
I know this sounds like a question that has been asked before but in my case, I am being asked to do some bitwise logic for an int32 value and interpreting that as a floating-point, not converting the value to a floating-point but interpreting the binary value associated with it as a float and finding the significant bits (also known as the mantissa) in the set. I am using IEE-754 standard for the conversion.
The instructions are:
"Use a bitmask to isolate the significant digits, then use a loop to iterate over each digit, shifting as necessary to get them in the LSB position, and multiplying by the appropriate power-of-2. The result will be a float without an exponent.
Don't forget to include the "hidden" bit!
Consider a loop that counts down -- craft it carefully and it will let you start at the LSB and work your way to the MSB."
From my understanding of this, I crafted this abomination
public static float decodeSignificantDigits(int value) {
int mask = 0x007FFFFF;
int significantBits = value & mask;
float significantDigits = 0;
for (int i = -23; i <= 0; i++) {
if (i != 0) {
significantDigits += (significantBits >> 1) * Math.pow(2, i);
} else {
significantDigits += 1 * Math.pow(2, i);
}
}
System.out.println(significantDigits);
return significantDigits;
}
I think I am on the right track here but I just can not visualize how to get this to work correctly.
The main idea is to get the following tests to pass:
#Test
void testWhenAllOnes() {
int bits = 0b00000000011111111111111111111111;
int bitsWithExpoentZero = 0b00111111111111111111111111111111;
assertEquals(Float.intBitsToFloat(bitsWithExpoentZero), FloatDecoder.decodeSignificantDigits(bits), TOL);
}
#Test
void testWhenMsbOfSignificantDigitsIsOneRestZeroes() {
int bits = 0b00000000010000000000000000000000;
int bitsWithExpoentZero = 0b00111111110000000000000000000000;
assertEquals(Float.intBitsToFloat(bitsWithExpoentZero), FloatDecoder.decodeSignificantDigits(bits), TOL);
}
#Test
void testWhenLsbOfSignificantDigitsIsOneRestZeroes() {
int bits = 0b00000000000000000000000000000001;
int bitsWithExpoentZero = 0b00111111100000000000000000000001;
assertEquals(Float.intBitsToFloat(bitsWithExpoentZero), FloatDecoder.decodeSignificantDigits(bits), TOL);
}
I was also informed not to use Integer.toBinaryString() for my solution.
Here's what I'm working with right now:
for (int i = 0, numSamples = soundBytes.length / 2; i < numSamples; i += 2)
{
// Get the samples.
int sample1 = ((soundBytes[i] & 0xFF) << 8) | (soundBytes[i + 1] & 0xFF); // Automatically converts to unsigned int 0...65535
int sample2 = ((outputBytes[i] & 0xFF) << 8) | (outputBytes[i + 1] & 0xFF); // Automatically converts to unsigned int 0...65535
// Normalize for simplicity.
float normalizedSample1 = sample1 / 65535.0f;
float normalizedSample2 = sample2 / 65535.0f;
float normalizedMixedSample = 0.0f;
// Apply the algorithm.
if (normalizedSample1 < 0.5f && normalizedSample2 < 0.5f)
normalizedMixedSample = 2.0f * normalizedSample1 * normalizedSample2;
else
normalizedMixedSample = 2.0f * (normalizedSample1 + normalizedSample2) - (2.0f * normalizedSample1 * normalizedSample2) - 1.0f;
int mixedSample = (int)(normalizedMixedSample * 65535);
// Replace the sample in soundBytes array with this mixed sample.
soundBytes[i] = (byte)((mixedSample >> 8) & 0xFF);
soundBytes[i + 1] = (byte)(mixedSample & 0xFF);
}
From as far as I can tell, it's an accurate representation of the algorithm defined on this page: http://www.vttoth.com/CMS/index.php/technical-notes/68
However, just mixing a sound with silence (all 0's) results in a sound that very obviously doesn't sound right, maybe it's best to describe it as higher-pitched and louder.
Would appreciate help in determining if I'm implementing the algorithm correctly, or if I simply need to go about it a different way (different algorithm/method)?
In the linked article the author assumes A and B to represent entire streams of audio. More specifically X means the maximum abs value of all of the samples in stream X - where X is either A or B. So what his algorithm does is scans the entirety of both streams to compute the max abs sample of each and then scales things so that the output theoretically peaks at 1.0. You'll need to make multiple passes over the data in order to implement this algorithm and if your data is streaming in then it simply will not work.
Here is an example of how I think the algorithm to work. It assumes that the samples have already been converted to floating point to side step the issue of your conversion code being wrong. I'll explain what is wrong with it later:
double[] samplesA = ConvertToDoubles(samples1);
double[] samplesB = ConvertToDoubles(samples2);
double A = ComputeMax(samplesA);
double B = ComputeMax(samplesB);
// Z always equals 1 which is an un-useful bit of information.
double Z = A+B-A*B;
// really need to find a value x such that xA+xB=1, which I think is:
double x = 1 / (Math.sqrt(A) * Math.sqrt(B));
// Now mix and scale the samples
double[] samples = MixAndScale(samplesA, samplesB, x);
Mixing and scaling:
double[] MixAndScale(double[] samplesA, double[] samplesB, double scalingFactor)
{
double[] result = new double[samplesA.length];
for (int i = 0; i < samplesA.length; i++)
result[i] = scalingFactor * (samplesA[i] + samplesB[i]);
}
Computing the max peak:
double ComputeMaxPeak(double[] samples)
{
double max = 0;
for (int i = 0; i < samples.length; i++)
{
double x = Math.abs(samples[i]);
if (x > max)
max = x;
}
return max;
}
And conversion. Notice how I'm using short so that the sign bit is properly maintained:
double[] ConvertToDouble(byte[] bytes)
{
double[] samples = new double[bytes.length/2];
for (int i = 0; i < samples.length; i++)
{
short tmp = ((short)bytes[i*2])<<8 + ((short)(bytes[i*2+1]);
samples[i] = tmp / 32767.0;
}
return samples;
}
I got a WAV (32 bit sample size, 8 byte per frame, 44100 Hz, PCM_Float), which in need to create a sample array of. This is the code I have used for a Wav with 16 bit sample size, 4 byte per frame, 44100 Hz, PCM_Signed.
private float[] getSampleArray(byte[] eightBitByteArray) {
int newArrayLength = eightBitByteArray.length
/ (2 * calculateNumberOfChannels()) + 1;
float[] toReturn = new float[newArrayLength];
int index = 0;
for (int t = 0; t + 4 < eightBitByteArray.length; t += 2) // t+2 -> skip
//2nd channel
{
int low=((int) eightBitByteArray[t++]) & 0x00ff;
int high=((int) eightBitByteArray[t++]) << 8;
double value = Math.pow(low+high, 2);
double dB = 0;
if (value != 0) {
dB = 20.0 * Math.log10(value); // calculate decibel
}
toReturn[index] = getFloatValue(dB); //minorly important conversion
//to normalized values
index++;
}
return toReturn;
}
Obviously this code cant work for the 32bits sample size Wav, as I have to consider 2 more bytes in the first channel.
Does anybody know how the 2 other bytes have to be added (and shiftet) to calculate the amplitude? Unfortunately google didnt help me at all :/.
Thanks in advance.
Something like this should do the trick.
for (int t = 0; t + 4 < eightBitByteArray.length; t += 4) // t+4 -> skip
//2nd channel
{
float value = ByteBuffer.wrap(eightBitByteArray, t, 4).order(ByteOrder.LITTLE_ENDIAN).getFloat();
double dB = 0;
if (value != 0) {
dB = 20.0 * Math.log10(value); // calculate decibel
}
toReturn[index] = getFloatValue(dB); //minorly important conversion
//to normalized values
index++;
}
On another note - converting instantaneous samples to dB is nonsensical.
I'm currently working on Java for Android. I try to implement the FFT in order to realize a kind of viewer of the frequencies.
Actually I was able to do it, but the display is not fluid at all.
I added some traces in order to check the treatment time of each part of my code, and the fact is that the FFT takes about 300ms to be applied on my complex array, that owns 4096 elements. And I need it to take less than 100ms, as my thread (that displays the frequencies) is refreshed every 100ms. I reduced the initial array in order that the FFT results own only 1028 elements, and it works, but the result is deprecated.
Does someone have an idea ?
I used the default fft.java and Complex.java classes that can be found on the internet.
For information, my code computing the FFT is the following :
int bytesPerSample = 2;
Complex[] x = new Complex[bufferSize/2] ;
for (int index = 0 ; index < bufferReadResult - bytesPerSample + 1; index += bytesPerSample)
{
// 16BITS = 2BYTES
float asFloat = Float.intBitsToFloat(asInt);
double sample = 0;
for (int b = 0; b < bytesPerSample; b++) {
int v = buffer[index + b];
if (b < bytesPerSample - 1 || bytesPerSample == 1) {
v &= 0xFF;
}
sample += v << (b * 8);
}
double sample32 = 100 * (sample / 32768.0); // don't know the use of this compute...
x[index/bytesPerSample] = new Complex(sample32, 0);
}
Complex[] tx = new Complex[1024]; // size = 2048
///// reduction of the size of the signal in order to improve the fft traitment time
for (int i = 0; i < x.length/4; i++)
{
tx[i] = new Complex(x[i*4].re(), 0);
}
// Signal retrieval thanks to the FFT
fftRes = FFT.fft(tx);
I don't know Java, but you're way of converting between your input data and an array of complex values seems very convoluted. You're building two arrays of complex data where only one is necessary.
Also it smells like your complex real and imaginary values are doubles. That's way over the top for what you need, and ARMs are veeeery slow at double arithmetic anyway. Is there a complex class based on single precision floats?
Thirdly you're performing a complex fft on real data by filling the imaginary part of your complexes with zero. Whilst the result will be correct it is twice as much work straight off (unless the routine is clever enough to spot that, which I doubt). If possible perform a real fft on your data and save half your time.
And then as Simon says there's the whole issue of avoiding garbage collection and memory allocation.
Also it looks like your FFT has no preparatory step. This mean that the routine FFT.fft() is calculating the complex exponentials every time. The longest part of the FFT calculation is working out the complex exponentials, which is a shame because for any given FFT length the exponentials are constants. They don't depend on your input data at all. In the real time world we use FFT routines where we calculate the exponentials once at the start of the program and then the actual fft itself takes that const array as one of its inputs. Don't know if your FFT class can do something similar.
If you do end up going to something like FFTW then you're going to have to get used to calling C code from your Java. Also make sure you get a version that supports (I think) NEON, ARM's answer to SSE, AVX and Altivec. It's worth ploughing through their release notes to check. Also I strongly suspect that FFTW will only be able to offer a significant speed up if you ask it to perform an FFT on single precision floats, not doubles.
Google luck!
--Edit--
I meant of course 'good luck'. Give me a real keyboard quick, these touchscreen ones are unreliable...
First, thanks for all your answers.
I followed them and made two test :
first one, I replace the double used in my Complex class by float. The result is just a bit better, but not enough.
then I've rewroten the fft method in order not to use Complex anymore, but a two-dimensional float array instead. For each row of this array, the first column contains the real part, and the second one the imaginary part.
I also changed my code in order to instanciate the float array only once, on the onCreate method.
And the result... is worst !! Now it takes a little bit more than 500ms instead of 300ms.
I don't know what to do now.
You can find below the initial fft fonction, and then the one I've re-wroten.
Thanks for your help.
// compute the FFT of x[], assuming its length is a power of 2
public static Complex[] fft(Complex[] x) {
int N = x.length;
// base case
if (N == 1) return new Complex[] { x[0] };
// radix 2 Cooley-Tukey FFT
if (N % 2 != 0) { throw new RuntimeException("N is not a power of 2 : " + N); }
// fft of even terms
Complex[] even = new Complex[N/2];
for (int k = 0; k < N/2; k++) {
even[k] = x[2*k];
}
Complex[] q = fft(even);
// fft of odd terms
Complex[] odd = even; // reuse the array
for (int k = 0; k < N/2; k++) {
odd[k] = x[2*k + 1];
}
Complex[] r = fft(odd);
// combine
Complex[] y = new Complex[N];
for (int k = 0; k < N/2; k++) {
double kth = -2 * k * Math.PI / N;
Complex wk = new Complex(Math.cos(kth), Math.sin(kth));
y[k] = q[k].plus(wk.times(r[k]));
y[k + N/2] = q[k].minus(wk.times(r[k]));
}
return y;
}
public static float[][] fftf(float[][] x) {
/**
* x[][0] = real part
* x[][1] = imaginary part
*/
int N = x.length;
// base case
if (N == 1) return new float[][] { x[0] };
// radix 2 Cooley-Tukey FFT
if (N % 2 != 0) { throw new RuntimeException("N is not a power of 2 : " + N); }
// fft of even terms
float[][] even = new float[N/2][2];
for (int k = 0; k < N/2; k++) {
even[k] = x[2*k];
}
float[][] q = fftf(even);
// fft of odd terms
float[][] odd = even; // reuse the array
for (int k = 0; k < N/2; k++) {
odd[k] = x[2*k + 1];
}
float[][] r = fftf(odd);
// combine
float[][] y = new float[N][2];
double kth, wkcos, wksin ;
for (int k = 0; k < N/2; k++) {
kth = -2 * k * Math.PI / N;
//Complex wk = new Complex(Math.cos(kth), Math.sin(kth));
wkcos = Math.cos(kth) ; // real part
wksin = Math.sin(kth) ; // imaginary part
// y[k] = q[k].plus(wk.times(r[k]));
y[k][0] = (float) (q[k][0] + wkcos * r[k][0] - wksin * r[k][1]);
y[k][1] = (float) (q[k][1] + wkcos * r[k][1] + wksin * r[k][0]);
// y[k + N/2] = q[k].minus(wk.times(r[k]));
y[k + N/2][0] = (float) (q[k][0] - (wkcos * r[k][0] - wksin * r[k][1]));
y[k + N/2][1] = (float) (q[k][1] - (wkcos * r[k][1] + wksin * r[k][0]));
}
return y;
}
actually I think I don't understand everything.
First, about Math.cos and Math.sin : how do you want me not to compute it each time ? Do you mean that I should instanciate the whole values only once (e.g store it in an array) and use them for each compute ?
Second, about the N % 2, indeed it's not very useful, I could make the test before the call of the function.
Third, about Simon's advice : I mixed what he said and what you said, that's why I've replaced the Complex by a two-dimensional float[][]. If that was not what he suggested, then what was it ?
At least, I'm not a FFT expert, so what do you mean by making a "real FFT" ? Do you mean that my imaginary part is useless ? If so, I'm not sure, because later in my code, I compute the magnitude of each frequence, so sqrt(real[i]*real[i] + imag[i]*imag[i]). And I think that my imaginary part is not equal to zero...
thanks !
As homework, I'm implementing Karatsuba's algorithm and benchmarking it against a primary-school-style O(n^2) multiplication algorithm on large integers.
I guessed my only choice here was to bring the numbers to their byte array representations and then work them from there.
Well, I'm stuck here... when using the * operator, I don't know how would I detect/correct if the number overflows a byte multiplication or adds a carry. Any ideas?
public static BigInteger simpleMultiply(BigInteger x, BigInteger y){
//BigInteger result = x.multiply(y);
byte [] xByteArray = x.toByteArray();
byte [] yByteArray = y.toByteArray();
int resultSize = xByteArray.length*yByteArray.length;
byte [][] rowsAndColumns = new byte[resultSize][resultSize];
for (int i =0; i<xByteArray.length;i++)
for (int j=0; j<yByteArray.length;j++){
rowsAndColumns[i][j] = (byte )(xByteArray[i] * yByteArray[j]);
// how would I detect/handle carry or overflow here?
}
return null;
}
The result of a byte multiplication is 2 bytes. You have to use the low order byte as the result and the high order byte as the carry (overflow).
I would also advise you to be careful of the sign of your bytes. Since bytes in Java are signed, you'll have to either use only the low 7 bits of them or convert them to ints and correct the sign before multiplying them.
You'll want a loop like:
for (int i =0; i<xByteArray.length;i++)
for (int j=0; j<yByteArray.length;j++){
// convert bytes to ints
int xDigit = xByteArray[i], yDigit = yByteArray[j];
// convert signed to unsigned
if (xDigit < 0)
xDigit += 256;
if (yDigit < 0)
yDigit += 256;
// compute result of multiplication
int result = xDigit * yDigit;
// capture low order byte
rowsAndColumns[i][j] = (byte)(result & 0xFF);
// get overflow (high order byte)
int overflow = result >> 8;
// handle overflow here
// ...
}
The best way to avoid overflow is not to have it happen in the first place. Make all your calculations with a higher width numbers to avoid problems.
For example, lets say we have base 256 numbers and each digit is stored as a single unsigned byte.
d1 = (int) digits[i] //convert to a higher-width number
d2 = (int) digits[j]
product = d1*d2 //ints can handle up to around 2^32. Shouldn't overflow w/ 256*256
result = product % 256
carry = product / 256
You could be fancy and convert the divisions by powers of two into bit operations, but it isn't really necessary.