How do I use audio sample data from Java Sound? - java
This question is usually asked as a part of another question but it turns out that the answer is long. I've decided to answer it here so I can link to it elsewhere.
Although I'm not aware of a way that Java can produce audio samples for us at this time, if that changes in the future, this can be a place for it. I know that JavaFX has some stuff like this, for example AudioSpectrumListener, but still not a way to access samples directly.
I'm using javax.sound.sampled for playback and/or recording but I'd like to do something with the audio.
Perhaps I'd like to display it visually or process it in some way.
How do I access audio sample data to do that with Java Sound?
See also:
Java Sound Tutorials (Official)
Java Sound Resources (Unofficial)
Well, the simplest answer is that at the moment Java can't produce sample data for the programmer.
This quote is from the official tutorial:
There are two ways to apply signal processing:
You can use any processing supported by the mixer or its component lines, by querying for Control objects and then setting the controls as the user desires. Typical controls supported by mixers and lines include gain, pan, and reverberation controls.
If the kind of processing you need isn't provided by the mixer or its lines, your program can operate directly on the audio bytes, manipulating them as desired.
This page discusses the first technique in greater detail, because there is no special API for the second technique.
Playback with javax.sound.sampled largely acts as a bridge between the file and the audio device. The bytes are read in from the file and sent off.
Don't assume the bytes are meaningful audio samples! Unless you happen to have an 8-bit AIFF file, they aren't. (On the other hand, if the samples are definitely 8-bit signed, you can do arithmetic with them. Using 8-bit is one way to avoid the complexity described here, if you're just playing around.)
So instead, I'll enumerate the types of AudioFormat.Encoding and describe how to decode them yourself. This answer will not cover how to encode them, but it's included in the complete code example at the bottom. Encoding is mostly just the decoding process in reverse.
This is a long answer but I wanted to give a thorough overview.
A Little About Digital Audio
Generally when digital audio is explained, we're referring to Linear Pulse-Code Modulation (LPCM).
A continuous sound wave is sampled at regular intervals and the amplitudes are quantized to integers of some scale.
Shown here is a sine wave sampled and quantized to 4-bit:
(Notice that the most positive value in two's complement representation is 1 less than the most negative value. This is a minor detail to be aware of. For example if you're clipping audio and forget this, the positive clips will overflow.)
When we have audio on the computer, we have an array of these samples. A sample array is what we want to turn the byte array in to.
To decode PCM samples, we don't care much about the sample rate or number of channels, so I won't be saying much about them here. Channels are usually interleaved, so that if we had an array of them, they'd be stored like this:
Index 0: Sample 0 (Left Channel)
Index 1: Sample 0 (Right Channel)
Index 2: Sample 1 (Left Channel)
Index 3: Sample 1 (Right Channel)
Index 4: Sample 2 (Left Channel)
Index 5: Sample 2 (Right Channel)
...
In other words, for stereo, the samples in the array just alternate between left and right.
Some Assumptions
All of the code examples will assume the following declarations:
byte[] bytes; The byte array, read from the AudioInputStream.
float[] samples; The output sample array that we're going to fill.
float sample; The sample we're currently working on.
long temp; An interim value used for general manipulation.
int i; The position in the byte array where the current sample's data starts.
We'll normalize all of the samples in our float[] array to the range of -1f <= sample <= 1f. All of the floating-point audio I've seen comes this way and it's pretty convenient.
If our source audio doesn't already come like that (as is for e.g. integer samples), we can normalize them ourselves using the following:
sample = sample / fullScale(bitsPerSample);
Where fullScale is 2bitsPerSample - 1, i.e. Math.pow(2, bitsPerSample-1).
How do I coerce the byte array in to meaningful data?
The byte array contains the sample frames split up and all in a line. This is actually very straight-forward except for something called endianness, which is the ordering of the bytes in each sample packet.
Here's a diagram. This sample (packed in to a byte array) holds the decimal value 9999:
24-bit sample as big-endian:
bytes[i] bytes[i + 1] bytes[i + 2]
┌──────┐ ┌──────┐ ┌──────┐
00000000 00100111 00001111
24-bit sample as little-endian:
bytes[i] bytes[i + 1] bytes[i + 2]
┌──────┐ ┌──────┐ ┌──────┐
00001111 00100111 00000000
They hold the same binary values; however, the byte orders are reversed.
In big-endian, the more significant bytes come before the less significant bytes.
In little-endian, the less significant bytes come before the more significant bytes.
WAV files are stored in little-endian order and AIFF files are stored in big-endian order. Endianness can be obtained from AudioFormat.isBigEndian.
To concatenate the bytes and put them in to our long temp variable, we:
Bitwise AND each byte with the mask 0xFF (which is 0b1111_1111) to avoid sign-extension when the byte is automatically promoted. (char, byte and short are promoted to int when arithmetic is performed on them.) See also What does value & 0xff do in Java?
Bit shift each byte in to position.
Bitwise OR the bytes together.
Here's a 24-bit example:
long temp;
if (isBigEndian) {
temp = (
((bytes[i ] & 0xffL) << 16)
| ((bytes[i + 1] & 0xffL) << 8)
| (bytes[i + 2] & 0xffL)
);
} else {
temp = (
(bytes[i ] & 0xffL)
| ((bytes[i + 1] & 0xffL) << 8)
| ((bytes[i + 2] & 0xffL) << 16)
);
}
Notice that the shift order is reversed based on endianness.
This can also be generalized to a loop, which can be seen in the full code at the bottom of this answer. (See the unpackAnyBit and packAnyBit methods.)
Now that we have the bytes concatenated together, we can take a few more steps to turn them in to a sample. The next steps depend on the actual encoding.
How do I decode Encoding.PCM_SIGNED?
The two's complement sign must be extended. This means that if the most significant bit (MSB) is set to 1, we fill all the bits above it with 1s. The arithmetic right-shift (>>) will do the filling for us automatically if the sign bit is set, so I usually do it this way:
int bitsToExtend = Long.SIZE - bitsPerSample;
float sample = (temp << bitsToExtend) >> bitsToExtend.
(Where Long.SIZE is 64. If our temp variable wasn't a long, we'd use something else. If we used e.g. int temp instead, we'd use 32.)
To understand how this works, here's a diagram of sign-extending 8-bit to 16-bit:
11111111 is the byte value -1, but the upper bits of the short are 0.
Shift the byte's MSB in to the MSB position of the short.
0000 0000 1111 1111
<< 8
───────────────────
1111 1111 0000 0000
Shift it back and the right-shift fills all the upper bits with 1s.
We now have the short value of -1.
1111 1111 0000 0000
>> 8
───────────────────
1111 1111 1111 1111
Positive values (that had a 0 in the MSB) are left unchanged. This is a nice property of the arithmetic right-shift.
Then normalize the sample, as described in Some Assumptions.
You might not need to write explicit sign-extension if your code is simple
Java does sign-extension automatically when converting from one integral type to a larger type, for example byte to int. If you know that your input and output format are always signed, you can use the automatic sign-extension while concatenating bytes in the earlier step.
Recall from the section above (How do I coerce the byte array in to meaningful data?) that we used b & 0xFF to prevent sign-extension from occurring. If you just remove the & 0xFF from the highest byte, sign-extension will happen automatically.
For example, the following decodes signed, big-endian, 16-bit samples:
for (int i = 0; i < bytes.length; i++) {
int sample = (bytes[i] << 8) // high byte is sign-extended
| (bytes[i + 1] & 0xFF); // low byte is not
// ...
}
How do I decode Encoding.PCM_UNSIGNED?
We turn it in to a signed number. Unsigned samples are simply offset so that, for example:
An unsigned value of 0 corresponds to the most negative signed value.
An unsigned value of 2bitsPerSample - 1 corresponds to the signed value of 0.
An unsigned value of 2bitsPerSample corresponds to the most positive signed value.
So this turns out to be pretty simple. Just subtract the offset:
float sample = temp - fullScale(bitsPerSample);
Then normalize the sample, as described in Some Assumptions.
How do I decode Encoding.PCM_FLOAT?
This is new since Java 7.
In practice, floating-point PCM is typically either IEEE 32-bit or IEEE 64-bit and already normalized to the range of ±1.0. The samples can be obtained with the utility methods Float#intBitsToFloat and Double#longBitsToDouble.
// IEEE 32-bit
float sample = Float.intBitsToFloat((int) temp);
// IEEE 64-bit
double sampleAsDouble = Double.longBitsToDouble(temp);
float sample = (float) sampleAsDouble; // or just use double for arithmetic
How do I decode Encoding.ULAW and Encoding.ALAW?
These are companding compression codecs that are more common in telephones and such. They're supported by javax.sound.sampled I assume because they're used by Sun's Au format. (However, it's not limited to just this type of container. For example, WAV can contain these encodings.)
You can conceptualize A-law and μ-law like they're a floating-point format. These are PCM formats but the range of values is non-linear.
There are two ways to decode them. I'll show the way which uses the mathematical formula. You can also decode them by manipulating the binary directly which is described in this blog post but it's more esoteric-looking.
For both, the compressed data is 8-bit. Standardly A-law is 13-bit when decoded and μ-law is 14-bit when decoded; however, applying the formula yields a range of ±1.0.
Before you can apply the formula, there are three things to do:
Some of the bits are standardly inverted for storage due to reasons involving data integrity.
They're stored as sign and magnitude (rather than two's complement).
The formula also expects a range of ±1.0, so the 8-bit value has to be scaled.
For μ-law all the bits are inverted, so:
temp ^= 0xffL; // 0xff == 0b1111_1111
(Note that we can't use ~, because we don't want to invert the high bits of the long.)
For A-law, every other bit is inverted, so:
temp ^= 0x55L; // 0x55 == 0b0101_0101
(XOR can be used to do inversion. See How do you set, clear and toggle a bit?)
To convert from sign and magnitude to two's complement, we:
Check to see if the sign bit was set.
If so, clear the sign bit and negate the number.
// 0x80 == 0b1000_0000
if ((temp & 0x80L) != 0) {
temp ^= 0x80L;
temp = -temp;
}
Then scale the encoded numbers, the same way as described in Some Assumptions:
sample = temp / fullScale(8);
Now we can apply the expansion.
The μ-law formula translated to Java is then:
sample = (float) (
signum(sample)
*
(1.0 / 255.0)
*
(pow(256.0, abs(sample)) - 1.0)
);
The A-law formula translated to Java is then:
float signum = signum(sample);
sample = abs(sample);
if (sample < (1.0 / (1.0 + log(87.7)))) {
sample = (float) (
sample * ((1.0 + log(87.7)) / 87.7)
);
} else {
sample = (float) (
exp((sample * (1.0 + log(87.7))) - 1.0) / 87.7
);
}
sample = signum * sample;
Here's the full example code for the SimpleAudioConversion class.
package mcve.audio;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioFormat.Encoding;
import static java.lang.Math.*;
/**
* <p>Performs simple audio format conversion.</p>
*
* <p>Example usage:</p>
*
* <pre>{#code AudioInputStream ais = ... ;
* SourceDataLine line = ... ;
* AudioFormat fmt = ... ;
*
* // do setup
*
* for (int blen = 0; (blen = ais.read(bytes)) > -1;) {
* int slen;
* slen = SimpleAudioConversion.decode(bytes, samples, blen, fmt);
*
* // do something with samples
*
* blen = SimpleAudioConversion.encode(samples, bytes, slen, fmt);
* line.write(bytes, 0, blen);
* }}</pre>
*
* #author Radiodef
* #see Overview on Stack Overflow
*/
public final class SimpleAudioConversion {
private SimpleAudioConversion() {}
/**
* Converts from a byte array to an audio sample float array.
*
* #param bytes the byte array, filled by the AudioInputStream
* #param samples an array to fill up with audio samples
* #param blen the return value of AudioInputStream.read
* #param fmt the source AudioFormat
*
* #return the number of valid audio samples converted
*
* #throws NullPointerException if bytes, samples or fmt is null
* #throws ArrayIndexOutOfBoundsException
* if bytes.length is less than blen or
* if samples.length is less than blen / bytesPerSample(fmt.getSampleSizeInBits())
*/
public static int decode(byte[] bytes,
float[] samples,
int blen,
AudioFormat fmt) {
int bitsPerSample = fmt.getSampleSizeInBits();
int bytesPerSample = bytesPerSample(bitsPerSample);
boolean isBigEndian = fmt.isBigEndian();
Encoding encoding = fmt.getEncoding();
double fullScale = fullScale(bitsPerSample);
int i = 0;
int s = 0;
while (i < blen) {
long temp = unpackBits(bytes, i, isBigEndian, bytesPerSample);
float sample = 0f;
if (encoding == Encoding.PCM_SIGNED) {
temp = extendSign(temp, bitsPerSample);
sample = (float) (temp / fullScale);
} else if (encoding == Encoding.PCM_UNSIGNED) {
temp = unsignedToSigned(temp, bitsPerSample);
sample = (float) (temp / fullScale);
} else if (encoding == Encoding.PCM_FLOAT) {
if (bitsPerSample == 32) {
sample = Float.intBitsToFloat((int) temp);
} else if (bitsPerSample == 64) {
sample = (float) Double.longBitsToDouble(temp);
}
} else if (encoding == Encoding.ULAW) {
sample = bitsToMuLaw(temp);
} else if (encoding == Encoding.ALAW) {
sample = bitsToALaw(temp);
}
samples[s] = sample;
i += bytesPerSample;
s++;
}
return s;
}
/**
* Converts from an audio sample float array to a byte array.
*
* #param samples an array of audio samples to encode
* #param bytes an array to fill up with bytes
* #param slen the return value of the decode method
* #param fmt the destination AudioFormat
*
* #return the number of valid bytes converted
*
* #throws NullPointerException if samples, bytes or fmt is null
* #throws ArrayIndexOutOfBoundsException
* if samples.length is less than slen or
* if bytes.length is less than slen * bytesPerSample(fmt.getSampleSizeInBits())
*/
public static int encode(float[] samples,
byte[] bytes,
int slen,
AudioFormat fmt) {
int bitsPerSample = fmt.getSampleSizeInBits();
int bytesPerSample = bytesPerSample(bitsPerSample);
boolean isBigEndian = fmt.isBigEndian();
Encoding encoding = fmt.getEncoding();
double fullScale = fullScale(bitsPerSample);
int i = 0;
int s = 0;
while (s < slen) {
float sample = samples[s];
long temp = 0L;
if (encoding == Encoding.PCM_SIGNED) {
temp = (long) (sample * fullScale);
} else if (encoding == Encoding.PCM_UNSIGNED) {
temp = (long) (sample * fullScale);
temp = signedToUnsigned(temp, bitsPerSample);
} else if (encoding == Encoding.PCM_FLOAT) {
if (bitsPerSample == 32) {
temp = Float.floatToRawIntBits(sample);
} else if (bitsPerSample == 64) {
temp = Double.doubleToRawLongBits(sample);
}
} else if (encoding == Encoding.ULAW) {
temp = muLawToBits(sample);
} else if (encoding == Encoding.ALAW) {
temp = aLawToBits(sample);
}
packBits(bytes, i, temp, isBigEndian, bytesPerSample);
i += bytesPerSample;
s++;
}
return i;
}
/**
* Computes the block-aligned bytes per sample of the audio format,
* using Math.ceil(bitsPerSample / 8.0).
* <p>
* Round towards the ceiling because formats that allow bit depths
* in non-integral multiples of 8 typically pad up to the nearest
* integral multiple of 8. So for example, a 31-bit AIFF file will
* actually store 32-bit blocks.
*
* #param bitsPerSample the return value of AudioFormat.getSampleSizeInBits
* #return The block-aligned bytes per sample of the audio format.
*/
public static int bytesPerSample(int bitsPerSample) {
return (int) ceil(bitsPerSample / 8.0); // optimization: ((bitsPerSample + 7) >>> 3)
}
/**
* Computes the largest magnitude representable by the audio format,
* using Math.pow(2.0, bitsPerSample - 1). Note that for two's complement
* audio, the largest positive value is one less than the return value of
* this method.
* <p>
* The result is returned as a double because in the case that
* bitsPerSample is 64, a long would overflow.
*
* #param bitsPerSample the return value of AudioFormat.getBitsPerSample
* #return the largest magnitude representable by the audio format
*/
public static double fullScale(int bitsPerSample) {
return pow(2.0, bitsPerSample - 1); // optimization: (1L << (bitsPerSample - 1))
}
private static long unpackBits(byte[] bytes,
int i,
boolean isBigEndian,
int bytesPerSample) {
switch (bytesPerSample) {
case 1: return unpack8Bit(bytes, i);
case 2: return unpack16Bit(bytes, i, isBigEndian);
case 3: return unpack24Bit(bytes, i, isBigEndian);
default: return unpackAnyBit(bytes, i, isBigEndian, bytesPerSample);
}
}
private static long unpack8Bit(byte[] bytes, int i) {
return bytes[i] & 0xffL;
}
private static long unpack16Bit(byte[] bytes,
int i,
boolean isBigEndian) {
if (isBigEndian) {
return (
((bytes[i ] & 0xffL) << 8)
| (bytes[i + 1] & 0xffL)
);
} else {
return (
(bytes[i ] & 0xffL)
| ((bytes[i + 1] & 0xffL) << 8)
);
}
}
private static long unpack24Bit(byte[] bytes,
int i,
boolean isBigEndian) {
if (isBigEndian) {
return (
((bytes[i ] & 0xffL) << 16)
| ((bytes[i + 1] & 0xffL) << 8)
| (bytes[i + 2] & 0xffL)
);
} else {
return (
(bytes[i ] & 0xffL)
| ((bytes[i + 1] & 0xffL) << 8)
| ((bytes[i + 2] & 0xffL) << 16)
);
}
}
private static long unpackAnyBit(byte[] bytes,
int i,
boolean isBigEndian,
int bytesPerSample) {
long temp = 0;
if (isBigEndian) {
for (int b = 0; b < bytesPerSample; b++) {
temp |= (bytes[i + b] & 0xffL) << (
8 * (bytesPerSample - b - 1)
);
}
} else {
for (int b = 0; b < bytesPerSample; b++) {
temp |= (bytes[i + b] & 0xffL) << (8 * b);
}
}
return temp;
}
private static void packBits(byte[] bytes,
int i,
long temp,
boolean isBigEndian,
int bytesPerSample) {
switch (bytesPerSample) {
case 1: pack8Bit(bytes, i, temp);
break;
case 2: pack16Bit(bytes, i, temp, isBigEndian);
break;
case 3: pack24Bit(bytes, i, temp, isBigEndian);
break;
default: packAnyBit(bytes, i, temp, isBigEndian, bytesPerSample);
break;
}
}
private static void pack8Bit(byte[] bytes, int i, long temp) {
bytes[i] = (byte) (temp & 0xffL);
}
private static void pack16Bit(byte[] bytes,
int i,
long temp,
boolean isBigEndian) {
if (isBigEndian) {
bytes[i ] = (byte) ((temp >>> 8) & 0xffL);
bytes[i + 1] = (byte) ( temp & 0xffL);
} else {
bytes[i ] = (byte) ( temp & 0xffL);
bytes[i + 1] = (byte) ((temp >>> 8) & 0xffL);
}
}
private static void pack24Bit(byte[] bytes,
int i,
long temp,
boolean isBigEndian) {
if (isBigEndian) {
bytes[i ] = (byte) ((temp >>> 16) & 0xffL);
bytes[i + 1] = (byte) ((temp >>> 8) & 0xffL);
bytes[i + 2] = (byte) ( temp & 0xffL);
} else {
bytes[i ] = (byte) ( temp & 0xffL);
bytes[i + 1] = (byte) ((temp >>> 8) & 0xffL);
bytes[i + 2] = (byte) ((temp >>> 16) & 0xffL);
}
}
private static void packAnyBit(byte[] bytes,
int i,
long temp,
boolean isBigEndian,
int bytesPerSample) {
if (isBigEndian) {
for (int b = 0; b < bytesPerSample; b++) {
bytes[i + b] = (byte) (
(temp >>> (8 * (bytesPerSample - b - 1))) & 0xffL
);
}
} else {
for (int b = 0; b < bytesPerSample; b++) {
bytes[i + b] = (byte) ((temp >>> (8 * b)) & 0xffL);
}
}
}
private static long extendSign(long temp, int bitsPerSample) {
int bitsToExtend = Long.SIZE - bitsPerSample;
return (temp << bitsToExtend) >> bitsToExtend;
}
private static long unsignedToSigned(long temp, int bitsPerSample) {
return temp - (long) fullScale(bitsPerSample);
}
private static long signedToUnsigned(long temp, int bitsPerSample) {
return temp + (long) fullScale(bitsPerSample);
}
// mu-law constant
private static final double MU = 255.0;
// A-law constant
private static final double A = 87.7;
// natural logarithm of A
private static final double LN_A = log(A);
private static float bitsToMuLaw(long temp) {
temp ^= 0xffL;
if ((temp & 0x80L) != 0) {
temp = -(temp ^ 0x80L);
}
float sample = (float) (temp / fullScale(8));
return (float) (
signum(sample)
*
(1.0 / MU)
*
(pow(1.0 + MU, abs(sample)) - 1.0)
);
}
private static long muLawToBits(float sample) {
double sign = signum(sample);
sample = abs(sample);
sample = (float) (
sign * (log(1.0 + (MU * sample)) / log(1.0 + MU))
);
long temp = (long) (sample * fullScale(8));
if (temp < 0) {
temp = -temp ^ 0x80L;
}
return temp ^ 0xffL;
}
private static float bitsToALaw(long temp) {
temp ^= 0x55L;
if ((temp & 0x80L) != 0) {
temp = -(temp ^ 0x80L);
}
float sample = (float) (temp / fullScale(8));
float sign = signum(sample);
sample = abs(sample);
if (sample < (1.0 / (1.0 + LN_A))) {
sample = (float) (sample * ((1.0 + LN_A) / A));
} else {
sample = (float) (exp((sample * (1.0 + LN_A)) - 1.0) / A);
}
return sign * sample;
}
private static long aLawToBits(float sample) {
double sign = signum(sample);
sample = abs(sample);
if (sample < (1.0 / A)) {
sample = (float) ((A * sample) / (1.0 + LN_A));
} else {
sample = (float) ((1.0 + log(A * sample)) / (1.0 + LN_A));
}
sample *= sign;
long temp = (long) (sample * fullScale(8));
if (temp < 0) {
temp = -temp ^ 0x80L;
}
return temp ^ 0x55L;
}
}
This is how you get the actual sample data from the currently playing sound. The other excellent answer will tell you what the data means. Haven't tried it on another OS than my Windows 10 machine YMMV. For me it pulls the current system default recording device. On Windows set it to "Stereo Mix" instead of "Microphone" to get playing sound. You may have to toggle "Show Disabled Devices" to see "Stereo Mix".
import javax.sound.sampled.*;
public class SampleAudio {
private static long extendSign(long temp, int bitsPerSample) {
int extensionBits = 64 - bitsPerSample;
return (temp << extensionBits) >> extensionBits;
}
public static void main(String[] args) throws LineUnavailableException {
float sampleRate = 8000;
int sampleSizeBits = 16;
int numChannels = 1; // Mono
AudioFormat format = new AudioFormat(sampleRate, sampleSizeBits, numChannels, true, true);
TargetDataLine tdl = AudioSystem.getTargetDataLine(format);
tdl.open(format);
tdl.start();
if (!tdl.isOpen()) {
System.exit(1);
}
byte[] data = new byte[(int)sampleRate*10];
int read = tdl.read(data, 0, (int)sampleRate*10);
if (read > 0) {
for (int i = 0; i < read-1; i = i + 2) {
long val = ((data[i] & 0xffL) << 8L) | (data[i + 1] & 0xffL);
long valf = extendSign(val, 16);
System.out.println(i + "\t" + valf);
}
}
tdl.close();
}
}
Related
Java: IEEE Doubles to IBM Float
I am working on a side project at work where I would like to read/write SAS Transport files. The challenge is that numbers are encoded in 64-bit IBM floating point numbers. While I have been able to find plenty of great resources for reading a byte array (containing an IBM float) into a IEEE 32-bit floats and 64-bit floats, I'm struggling to find the code to convert floats/doubles back to IBM floats. I recently found some code for writing a 32-bit IEEE float back out to a byte array (containing an IBM float). It seems to be working, so I've been trying to translate it to a 64-bit version. I've reversed engineered where most of the magic numbers are coming from, but I've been stumped for over a week now. I have also tried to translate the functions listed at the end of the SAS Transport documentation to Java, but I've run into a lot of issues related to endiness, Java's lack of unsigned types, and so on. Can anyone provide the code to convert doubles to IBM floating point format? Just to show the progress I've made, here are some shortened versions of the code I've written so far: This grabs a 32-bit IBM float from a byte array and generates an IEEE float: public static double fromIBMFloat(byte[] data, int offset) { int temp = readIntFromBuffer(data, offset); int mantissa = temp & 0x00FFFFFF; int exponent = ((temp >> 24) & 0x7F) - 64; boolean isNegative = (temp & 0x80000000) != 0; double result = mantissa * Math.pow(2, 4 * exponent - 24); if (isNegative) { result = -result; } return result; } This is the same thing for 64-bit: public static double fromIBMDouble(byte[] data, int offset) { long temp = readLongFromBuffer(data, offset); long mantissa = temp & 0x00FFFFFFFFFFFFFFL; long exponent = ((temp >> 56) & 0x7F) - 64; boolean isNegative = (temp & 0x8000000000000000L) != 0; double result = mantissa * Math.pow(2, 4 * exponent - 24); if (isNegative) { result = -result; } return result; } Great! These work for going to IEEE floats, but now I need to go the other way. This simple implementation seems to be working for 32-bit floats: public static void toIBMFloat(double value, byte[] xport, int offset) { if (value == 0.0 || Double.isNaN(value) || Double.isInfinite(value)) { writeIntToBuffer(xport, offset, 0); return; } int fconv = Float.floatToIntBits((float)value); int fmant = (fconv & 0x007FFFFF) | 0x00800000; int temp = (fconv & 0x7F800000) >> 23; int t = (temp & 0xFF) - 126; while ((t & 0x3) != 0) { ++t; fmant >>= 1; } fconv = (fconv & 0x80000000) | (((t >> 2) + 64) << 24) | fmant; writeIntToBuffer(xport, offset, fconv); } Now, the only thing left is to translate that to work with 64-bit IBM floats. A lot of the magic numbers listed relate to the number of bits in the IEEE 32-bit floating point exponent (8-bits) and mantissa (23-bit). So for 64-bit, I just need to switch those to use the 11-bit exponent and 52-bit mantissa. But where does that 126 come from? What is the point of the 0x3 in the while loop? Any help breaking down the 32-bit version so I can implement a 64-bit version would be greatly appreciated.
I circled back and took another swing at the C implementations provided at the end of the SAS transport documentation. It turns out the issue wasn't with my implementation; it was an issue with my tests. TL;DR These are my 64-bit implementations: public static void writeIBMDouble(double value, byte[] data, int offset) { long ieee8 = Double.doubleToLongBits(value); long ieee1 = (ieee8 >>> 32) & 0xFFFFFFFFL; long ieee2 = ieee8 & 0xFFFFFFFFL; writeLong(0L, data, offset); long xport1 = ieee1 & 0x000FFFFFL; long xport2 = ieee2; int ieee_exp = 0; if (xport2 != 0 || ieee1 != 0) { ieee_exp = (int)(((ieee1 >>> 16) & 0x7FF0) >>> 4) - 1023; int shift = ieee_exp & 0x3; xport1 |= 0x00100000L; if (shift != 0) { xport1 <<= shift; xport1 |= ((byte)(((ieee2 >>> 24) & 0xE0) >>> (5 + (3 - shift)))); xport2 <<= shift; } xport1 |= (((ieee_exp >>> 2) + 65) | ((ieee1 >>> 24) & 0x80)) << 24; } if (-260 <= ieee_exp && ieee_exp <= 248) { long temp = ((xport1 & 0xFFFFFFFFL) << 32) | (xport2 & 0xFFFFFFFFL); writeLong(temp, data, offset); return; } writeLong(0xFFFFFFFFFFFFFFFFL, data, offset); if (ieee_exp > 248) { data[offset] = 0x7F; } } public static void writeLong(long value, byte[] buffer, int offset) { buffer[offset] = (byte)(value >>> 56); buffer[offset + 1] = (byte)(value >>> 48); buffer[offset + 2] = (byte)(value >>> 40); buffer[offset + 3] = (byte)(value >>> 32); buffer[offset + 4] = (byte)(value >>> 24); buffer[offset + 5] = (byte)(value >>> 16); buffer[offset + 6] = (byte)(value >>> 8); buffer[offset + 7] = (byte)value; } And: public static double readIBMDouble(byte[] data, int offset) { long temp = readLong(data, offset); long ieee = 0L; long xport1 = temp >>> 32; long xport2 = temp & 0x00000000FFFFFFFFL; long ieee1 = xport1 & 0x00ffffff; long ieee2 = xport2; if (ieee2 == 0L && xport1 == 0L) { return Double.longBitsToDouble(ieee); } int shift = 0; int nib = (int)xport1; if ((nib & 0x00800000) != 0) { shift = 3; } else if ((nib & 0x00400000) != 0) { shift = 2; } else if ((nib & 0x00200000) != 0) { shift = 1; } if (shift != 0) { ieee1 >>>= shift; ieee2 = (xport2 >>> shift) | ((xport1 & 0x00000007) << (29 + (3 - shift))); } ieee1 &= 0xffefffff; ieee1 |= (((((long)(data[offset] & 0x7f) - 65) << 2) + shift + 1023) << 20) | (xport1 & 0x80000000); ieee = ieee1 << 32 | ieee2; return Double.longBitsToDouble(ieee); } public static long readLong(byte[] buffer, int offset) { long result = unsignedByteToLong(buffer[offset]) << 56; result |= unsignedByteToLong(buffer[offset + 1]) << 48; result |= unsignedByteToLong(buffer[offset + 2]) << 40; result |= unsignedByteToLong(buffer[offset + 3]) << 32; result |= unsignedByteToLong(buffer[offset + 4]) << 24; result |= unsignedByteToLong(buffer[offset + 5]) << 16; result |= unsignedByteToLong(buffer[offset + 6]) << 8; result |= unsignedByteToLong(buffer[offset + 7]); return result; } private static long unsignedByteToLong(byte value) { return (long)value & 0xFF; } These are basically a one-to-one translation from what's in the document, except I convert the byte[] into a long up-front and just do bit-twiddling instead of working directly with bytes. I also realized the code in the documentation had some special cases included for "missing" values that are specific to the SAS transport standard and have nothing to do with IBM hexidecimal floating point numbers. In fact, the Double.longBitsToDouble method detects the invalid bit-sequence and just sets the value to NaN. I moved this code out since it wasn't going to work anyway. The good thing is that as part of this exercise I did learn a lot of tricks to bit manipulation in Java. For instance, a lot of the issues I ran into involving sign were resolved by using the >>> operator instead of the >> operator. Other than that, you just need to be careful upcasting to mask with 0xFF, 0xFFFF, etc. to make sure the sign is ignored. I also learned about ByteBuffer which can facilitate loading back and forth among byte[] and primitives/strings; however, that comes with some minor overhead. But it would handle any endianness issues. It turns out endianness wasn't even a concern since most architectures in use today (x86) are little endian to begin with. It seems reading/writing SAS transport files is a pretty common need, especially in the clinical trials arena so hopefully anyone working in Java/C# won't have to go through the trouble I did.
How to convert 18 bits two-complements into float number using java
The data is uploaded by 18-bits ADC. One data is split into three bytes and the last 6 bits is useless. The reference voltage is 1 volt, that means 0x1FFFF represents 1 and 0x3FFFF represents -1. How to convert 18-bits twos-complement into float using java. I have written one and it works, but I think it is not efficient enough. My java is terrible. float data; int value = ((byte0 & 0xff) << 10) | ((byte1 & 0xff) << 2) | ((byte2 & 0xff) >> 6); // combine 3 bytes into int int tmp = value & 0x2000; // judge positive or negative if (tmp != 0) { value = value - 262144 /* 2^18 */; data = ((float)value) * 2 / 262143 /* 2^18-1 */; } else { data = ((float)value) * 2 / 262143; }
You could try double data = (value << 14) / (double) (0x1FFFF << 14); This will use a shifted 32-bit 2s complement value. NOTE: If it was 2s complement, 0x3ffff should be the smallest negative number, and 0x20000 is the largest negative number.
Put the sign bit of the ADC in the same place as the sign bit of a 32-bit integer and you can simplify by using the native sign. public static final float ADC_RANGE = 1.0f; // From -1V to +1V public static final int ADC_BITS = 18; public static final int ADC_RANGE = 1 << (ADC_BITS - 1); public static final int ADC_MASK = (ADC_RANGE - 1) << (32 - ADC_BITS); int bits = ((byte0 & 0xFF) << 24) | ((byte1 & 0xFF) << 16) | ((byte2 & 0xC0) << 8); // Combine 3 bytes into int, left-aligned float value = bits / (float)ADC_MASK * ADC_RANGE; [edit] use constants for 'magic numbers'
Bit shift operations on a byte array in Java
How do I shift a byte array n positions to the right? For instance shifting a 16 byte array right 29 positions? I read somewhere it can be done using a long? Would using a long work like this: Long k1 = byte array from 0 to 7 Long k2 = byte array from 8 to 15 Then right rotating these two longs using Long.rotateRight(Long x, number of rotations).How would the two longs be joined back into a byte array?
I believe you can do this using java.math.BigInteger which supports shifts on arbitrarily large numbers. This has advantage of simplicity, but disadvantage of not padding into original byte array size, i.e. input could be 16 bytes but output might only be 10 etc, requiring additional logic. BigInteger approach byte [] array = new byte[]{0x7F,0x11,0x22,0x33,0x44,0x55,0x66,0x77}; // create from array BigInteger bigInt = new BigInteger(array); // shift BigInteger shiftInt = bigInt.shiftRight(4); // back to array byte [] shifted = shiftInt.toByteArray(); // print it as hex for (byte b : shifted) { System.out.print(String.format("%x", b)); } Output 7f1122334455667 <== shifted 4 to the right. Looks OK Long manipulation I don't know why you'd want to do this as rotateRight() as this makes life more difficult, you have to blank at the bits that appear at the left hand side in K1 etc. You'd be better with using shift IMO as describe below. I've used a shift of 20 as divisible by 4 so easier to see the nibbles move in the output. 1) Use ByteBuffer to form two longs from 16 byte array byte[] array = { 0x00, 0x00, 0x11, 0x11, 0x22, 0x22, 0x33, 0x33, 0x44, 0x44, 0x55, 0x55, 0x66, 0x66, 0x77, 0x77 }; ByteBuffer buffer = ByteBuffer.wrap(array); long k1 = buffer.getLong(); long k2 = buffer.getLong(); 2) Shift each long n bits to the right int n = 20; long k1Shift = k1 >> n; long k2Shift = k2 >> n; System.out.println(String.format("%016x => %016x", k1, k1Shift)); System.out.println(String.format("%016x => %016x", k2, k2Shift)); 0000111122223333 => 0000000001111222 4444555566667777 => 0000044445555666 Determine bits from k1 that "got pushed off the edge" long k1CarryBits = (k1 << (64 - n)); System.out.println(String.format("%016x => %016x", k1, k1CarryBits)); 0000111122223333 => 2333300000000000 Join the K1 carry bits onto K2 on right hand side long k2WithCarray = k2Shift | k1CarryBits; System.out.println(String.format("%016x => %016x", k2Shift, k2WithCarray)); 0000044445555666 => 2333344445555666 Write the two longs back into a ByteBuffer and extract as a byte array buffer.position(0); buffer.putLong(k1Shift); buffer.putLong(k2WithCarray); for (byte each : buffer.array()) { System.out.print(Long.toHexString(each)); } 000011112222333344445555666
Here is what I came up with to shift a byte array by some arbitrary number of bits left: /** * Shifts input byte array len bits left.This method will alter the input byte array. */ public static byte[] shiftLeft(byte[] data, int len) { int word_size = (len / 8) + 1; int shift = len % 8; byte carry_mask = (byte) ((1 << shift) - 1); int offset = word_size - 1; for (int i = 0; i < data.length; i++) { int src_index = i+offset; if (src_index >= data.length) { data[i] = 0; } else { byte src = data[src_index]; byte dst = (byte) (src << shift); if (src_index+1 < data.length) { dst |= data[src_index+1] >>> (8-shift) & carry_mask; } data[i] = dst; } } return data; }
1. Manually implemented Here are left and right shift implementation without using BigInteger (ie. without creating a copy of the input array) and with unsigned right shift (BigInteger only supports arithmetic shifts of course) Left Shift << /** * Left shift of whole byte array by shiftBitCount bits. * This method will alter the input byte array. */ static byte[] shiftLeft(byte[] byteArray, int shiftBitCount) { final int shiftMod = shiftBitCount % 8; final byte carryMask = (byte) ((1 << shiftMod) - 1); final int offsetBytes = (shiftBitCount / 8); int sourceIndex; for (int i = 0; i < byteArray.length; i++) { sourceIndex = i + offsetBytes; if (sourceIndex >= byteArray.length) { byteArray[i] = 0; } else { byte src = byteArray[sourceIndex]; byte dst = (byte) (src << shiftMod); if (sourceIndex + 1 < byteArray.length) { dst |= byteArray[sourceIndex + 1] >>> (8 - shiftMod) & carryMask; } byteArray[i] = dst; } } return byteArray; } Unsigned Right Shift >>> /** * Unsigned/logical right shift of whole byte array by shiftBitCount bits. * This method will alter the input byte array. */ static byte[] shiftRight(byte[] byteArray, int shiftBitCount) { final int shiftMod = shiftBitCount % 8; final byte carryMask = (byte) (0xFF << (8 - shiftMod)); final int offsetBytes = (shiftBitCount / 8); int sourceIndex; for (int i = byteArray.length - 1; i >= 0; i--) { sourceIndex = i - offsetBytes; if (sourceIndex < 0) { byteArray[i] = 0; } else { byte src = byteArray[sourceIndex]; byte dst = (byte) ((0xff & src) >>> shiftMod); if (sourceIndex - 1 >= 0) { dst |= byteArray[sourceIndex - 1] << (8 - shiftMod) & carryMask; } byteArray[i] = dst; } } return byteArray; } Used in this class by this Project. 2. Using BigInteger Be aware that BigInteger internally converts the byte array into an int[] array so this may not be the most optimized solution: Arithmetic Left Shift <<: byte[] result = new BigInteger(byteArray).shiftLeft(3).toByteArray(); Arithmetic Right Shift >>: byte[] result = new BigInteger(byteArray).shiftRight(2).toByteArray(); 3. External Library Using the Bytes java library*: Add to pom.xml: <dependency> <groupId>at.favre.lib</groupId> <artifactId>bytes</artifactId> <version>{latest-version}</version> </dependency> Code example: Bytes b = Bytes.wrap(someByteArray); b.leftShift(3); b.rightShift(3); byte[] result = b.array(); *Full Disclaimer: I am the developer.
The is an old post, but I want to update Adam's answer. The long solution works with a few tweak. In order to rotate, use >>> instead of >>, because >> will pad with significant bit, changing the original value. second, the printbyte function seems to miss leading 00 when it prints. use this instead. private String getHexString(byte[] b) { StringBuilder result = new StringBuilder(); for (int i = 0; i < b.length; i++) result.append(Integer.toString((b[i] & 0xff) + 0x100, 16) .substring(1)); return result.toString(); }
What kind of output should I see from getFft?
Alright, so I am working on creating an Android audio visualization app. The problem is, what I get form the method getFft() doesn't jive with what google says it should produce. I traced the source code all the way back to C++, but I am not familiar enough with C++ or FFT to actually understand what is happening. I will try and include everything needed here: (Java) Visualizer.getFft(byte[] fft) /** * Returns a frequency capture of currently playing audio content. The capture is a 8-bit * magnitude FFT. Note that the size of the FFT is half of the specified capture size but both * sides of the spectrum are returned yielding in a number of bytes equal to the capture size. * {#see #getCaptureSize()}. * <p>This method must be called when the Visualizer is enabled. * #param fft array of bytes where the FFT should be returned * #return {#link #SUCCESS} in case of success, * {#link #ERROR_NO_MEMORY}, {#link #ERROR_INVALID_OPERATION} or {#link #ERROR_DEAD_OBJECT} * in case of failure. * #throws IllegalStateException */ public int getFft(byte[] fft) throws IllegalStateException { synchronized (mStateLock) { if (mState != STATE_ENABLED) { throw(new IllegalStateException("getFft() called in wrong state: "+mState)); } return native_getFft(fft); } } (C++) Visualizer.getFft(uint8_t *fft) status_t Visualizer::getFft(uint8_t *fft) { if (fft == NULL) { return BAD_VALUE; } if (mCaptureSize == 0) { return NO_INIT; } status_t status = NO_ERROR; if (mEnabled) { uint8_t buf[mCaptureSize]; status = getWaveForm(buf); if (status == NO_ERROR) { status = doFft(fft, buf); } } else { memset(fft, 0, mCaptureSize); } return status; } (C++) Visualizer.doFft(uint8_t *fft, uint8_t *waveform) status_t Visualizer::doFft(uint8_t *fft, uint8_t *waveform) { int32_t workspace[mCaptureSize >> 1]; int32_t nonzero = 0; for (uint32_t i = 0; i < mCaptureSize; i += 2) { workspace[i >> 1] = (waveform[i] ^ 0x80) << 23; workspace[i >> 1] |= (waveform[i + 1] ^ 0x80) << 7; nonzero |= workspace[i >> 1]; } if (nonzero) { fixed_fft_real(mCaptureSize >> 1, workspace); } for (uint32_t i = 0; i < mCaptureSize; i += 2) { fft[i] = workspace[i >> 1] >> 23; fft[i + 1] = workspace[i >> 1] >> 7; } return NO_ERROR; } (C++) fixedfft.fixed_fft_real(int n, int32_t *v) void fixed_fft_real(int n, int32_t *v) { int scale = LOG_FFT_SIZE, m = n >> 1, i; fixed_fft(n, v); for (i = 1; i <= n; i <<= 1, --scale); v[0] = mult(~v[0], 0x80008000); v[m] = half(v[m]); for (i = 1; i < n >> 1; ++i) { int32_t x = half(v[i]); int32_t z = half(v[n - i]); int32_t y = z - (x ^ 0xFFFF); x = half(x + (z ^ 0xFFFF)); y = mult(y, twiddle[i << scale]); v[i] = x - y; v[n - i] = (x + y) ^ 0xFFFF; } } (C++) fixedfft.fixed_fft(int n, int32_t *v) void fixed_fft(int n, int32_t *v) { int scale = LOG_FFT_SIZE, i, p, r; for (r = 0, i = 1; i < n; ++i) { for (p = n; !(p & r); p >>= 1, r ^= p); if (i < r) { int32_t t = v[i]; v[i] = v[r]; v[r] = t; } } for (p = 1; p < n; p <<= 1) { --scale; for (i = 0; i < n; i += p << 1) { int32_t x = half(v[i]); int32_t y = half(v[i + p]); v[i] = x + y; v[i + p] = x - y; } for (r = 1; r < p; ++r) { int32_t w = MAX_FFT_SIZE / 4 - (r << scale); i = w >> 31; w = twiddle[(w ^ i) - i] ^ (i << 16); for (i = r; i < n; i += p << 1) { int32_t x = half(v[i]); int32_t y = mult(w, v[i + p]); v[i] = x - y; v[i + p] = x + y; } } } } If you made it through all that, you are awesome! So my issue, is when I call the java method getFft() I end up with negative values, which shouldn't exist if the returned array is meant to represent magnitude. So my question is, what do I need to do to make the array represent magnitude? EDIT: It appears my data may actually be the Fourier coefficients. I was poking around the web and found this. The applet "Start Function FFT" displays a graphed representation of coefficients and it is a spitting image of what happens when I graph the data from getFft(). So new question: Is this what my data is? and if so, how do I go from the coefficients to a spectral analysis of it?
An FFT doesn't just produce magnitude; it produces phase as well (the output for each sample is a complex number). If you want magnitude, then you need to explicitly calculate it for each output sample, as re*re + im*im, where re and im are the real and imaginary components of each complex number, respectively. Unfortunately, I can't see anywhere in your code where you're working with complex numbers, so perhaps some rewrite is required. UPDATE If I had to guess (after glancing at the code), I'd say that real components were at even indices, and odd components were at odd indices. So to get magnitudes, you'd need to do something like: uint32_t mag[N/2]; for (int i = 0; i < N/2; i++) { mag[i] = fft[2*i]*fft[2*i] + fft[2*i+1]*fft[2*i+1]; }
One possible explanation why you see negative values: byte is a signed data type in Java. All values, that are greater or equal 1000 00002 are interpreted as negative integers. If we know that all values should are expected to be in the range [0..255], then we have map the values to a larger type and filter the upper bits: byte signedByte = 0xff; // = -1 short unsignedByte = ((short) signedByte) & 0xff; // = 255
"The capture is a 8-bit magnitude FFT" probably means that the return values have an 8-bit magnitude, not that they are magnitudes themselves. According to Jason For real-valued signals, like the ones you have in audio processing, the negative frequency output will be a mirror image of the positive frequencies. Android 2.3 Visualizer - Trouble understanding getFft()
Understanding big endian , little endian (again)
I need your help. I'd like to use the following code since I'm developing an audio tool (for .wav files only) whose main feature is displaying a signal waveform. Big Endian as well as Little endian are elements I can't help dealing with. Am I right in thinking that the following code tackles the problem in a way that: - if the audio file sample size is 16 bit or 8 bit, and Big endian or Little endian it rearranges the samples using the audioData array? If my reasoning is correct , does the rearrangement is always from big endian to little endian ? Is the computer architecture taken into account? (I mean if my computer uses Little endian, what happens?) Thank you int[] audioData = null; if (format.getSampleSizeInBits() == 16) { int nlengthInSamples = audioBytes.size() / 2; audioData = new int[nlengthInSamples]; // se Big Endian if (format.isBigEndian()) { for (int i = 0; i < nlengthInSamples; i++) { // MSB calculation - Most Significant Bit int MSB = (int) audioBytes.get(2 * i); // LSB calculation - Least Significant Bit int LSB = (int) audioBytes.get(2 * i + 1); // (MSB << 8) MSB shift to the left by 8 position // (255 & LSB) LSB masking so that LSB is <255 or =0 // putting both together audioData[i] = MSB << 8 | (255 & LSB); } } else { for (int i = 0; i < nlengthInSamples; i++) { int LSB = (int) audioBytes.get(2 * i); int MSB = (int) audioBytes.get(2 * i + 1); audioData[i] = MSB << 8 | (255 & LSB); } }
Everything in Java is Big Endian, so you don't need to worry about that. What you have appears correct (obviously, test it to be sure) but I would recommend using the following pattern to avoid code replication down the road. int nlengthInSamples = audioBytes.size() / 2; audioData = new int[nlengthInSamples]; // default BigEndian int MSB = (int) audioBytes.get(2 * i); int LSB = (int) audioBytes.get(2 * i + 1); // swap for Little Endian if(!format.isBigEndian()) { int temp = MSB; MSB = LSB; LSB = temp; } audioData[i] = MSB << 8 | (255 & LSB);