I got bored and decided to dive into remaking the square root function without referencing any of the Math.java functions. I have gotten to this point:
package sqrt;
public class SquareRoot {
public static void main(String[] args) {
System.out.println(sqrtOf(8));
}
public static double sqrtOf(double n){
double x = log(n,2);
return powerOf(2, x/2);
}
public static double log(double n, double base)
{
return (Math.log(n)/Math.log(base));
}
public static double powerOf(double x, double y) {
return powerOf(e(),y * log(x, e()));
}
public static int factorial(int n){
if(n <= 1){
return 1;
}else{
return n * factorial((n-1));
}
}
public static double e(){
return 1/factorial(1);
}
public static double e(int precision){
return 1/factorial(precision);
}
}
As you may very well see, I came to the point in my powerOf() function that infinitely recalls itself. I could replace that and use Math.exp(y * log(x, e()), so I dived into the Math source code to see how it handled my problem, resulting in a goose chase.
public static double exp(double a) {
return StrictMath.exp(a); // default impl. delegates to StrictMath
}
which leads to:
public static double exp(double x)
{
if (x != x)
return x;
if (x > EXP_LIMIT_H)
return Double.POSITIVE_INFINITY;
if (x < EXP_LIMIT_L)
return 0;
// Argument reduction.
double hi;
double lo;
int k;
double t = abs(x);
if (t > 0.5 * LN2)
{
if (t < 1.5 * LN2)
{
hi = t - LN2_H;
lo = LN2_L;
k = 1;
}
else
{
k = (int) (INV_LN2 * t + 0.5);
hi = t - k * LN2_H;
lo = k * LN2_L;
}
if (x < 0)
{
hi = -hi;
lo = -lo;
k = -k;
}
x = hi - lo;
}
else if (t < 1 / TWO_28)
return 1;
else
lo = hi = k = 0;
// Now x is in primary range.
t = x * x;
double c = x - t * (P1 + t * (P2 + t * (P3 + t * (P4 + t * P5))));
if (k == 0)
return 1 - (x * c / (c - 2) - x);
double y = 1 - (lo - x * c / (2 - c) - hi);
return scale(y, k);
}
Values that are referenced:
LN2 = 0.6931471805599453, // Long bits 0x3fe62e42fefa39efL.
LN2_H = 0.6931471803691238, // Long bits 0x3fe62e42fee00000L.
LN2_L = 1.9082149292705877e-10, // Long bits 0x3dea39ef35793c76L.
INV_LN2 = 1.4426950408889634, // Long bits 0x3ff71547652b82feL.
INV_LN2_H = 1.4426950216293335, // Long bits 0x3ff7154760000000L.
INV_LN2_L = 1.9259629911266175e-8; // Long bits 0x3e54ae0bf85ddf44L.
P1 = 0.16666666666666602, // Long bits 0x3fc555555555553eL.
P2 = -2.7777777777015593e-3, // Long bits 0xbf66c16c16bebd93L.
P3 = 6.613756321437934e-5, // Long bits 0x3f11566aaf25de2cL.
P4 = -1.6533902205465252e-6, // Long bits 0xbebbbd41c5d26bf1L.
P5 = 4.1381367970572385e-8, // Long bits 0x3e66376972bea4d0L.
TWO_28 = 0x10000000, // Long bits 0x41b0000000000000L
Here is where I'm starting to get lost. But I can make a few assumptions that so far the answer is starting to become estimated. I then find myself here:
private static double scale(double x, int n)
{
if (Configuration.DEBUG && abs(n) >= 2048)
throw new InternalError("Assertion failure");
if (x == 0 || x == Double.NEGATIVE_INFINITY
|| ! (x < Double.POSITIVE_INFINITY) || n == 0)
return x;
long bits = Double.doubleToLongBits(x);
int exp = (int) (bits >> 52) & 0x7ff;
if (exp == 0) // Subnormal x.
{
x *= TWO_54;
exp = ((int) (Double.doubleToLongBits(x) >> 52) & 0x7ff) - 54;
}
exp += n;
if (exp > 0x7fe) // Overflow.
return Double.POSITIVE_INFINITY * x;
if (exp > 0) // Normal.
return Double.longBitsToDouble((bits & 0x800fffffffffffffL)
| ((long) exp << 52));
if (exp <= -54)
return 0 * x; // Underflow.
exp += 54; // Subnormal result.
x = Double.longBitsToDouble((bits & 0x800fffffffffffffL)
| ((long) exp << 52));
return x * (1 / TWO_54);
}
TWO_54 = 0x40000000000000L
While I am, I would say, very understanding of math and programming, I hit the point to where I find myself at a Frankenstein monster mix of the two. I noticed the intrinsic switch to bits (which I have little to no experience with), and I was hoping someone could explain to me the processes that are occurring "under the hood" so to speak. Specifically where I got lost is from "Now x is in primary range" in the exp() method on wards and what the values that are being referenced really represent. I'm was asking for someone to help me understand not only the methods themselves, but also how they arrive to the answer. Feel free to go as in depth as needed.
edit:
if someone could maybe make this tag: "strictMath" that would be great. I believe that its size and for the Math library deriving from it justifies its existence.
To the exponential function:
What happens is that
exp(x) = 2^k * exp(x-k*log(2))
is exploited for positive x. Some magic is used to get more consistent results for large x where the reduction x-k*log(2) will introduce cancellation errors.
On the reduced x a rational approximation with minimized maximal error over the interval 0.5..1.5 is used, see Pade approximations and similar. This is based on the symmetric formula
exp(x) = exp(x/2)/exp(-x/2) = (c(x²)+x)/(c(x²)-x)
(note that the c in the code is x+c(x)-2). When using Taylor series, approximations for c(x*x)=x*coth(x/2) are based on
c(u)=2 + 1/6*u - 1/360*u^2 + 1/15120*u^3 - 1/604800*u^4 + 1/23950080*u^5 - 691/653837184000*u^6
The scale(x,n) function implements the multiplication x*2^n by directly manipulating the exponent in the bit assembly of the double floating point format.
Computing square roots
To compute square roots it would be more advantageous to compute them directly. First reduce the interval of approximation arguments via
sqrt(x)=2^k*sqrt(x/4^k)
which can again be done efficiently by directly manipulating the bit format of double.
After x is reduced to the interval 0.5..2.0 one can then employ formulas of the form
u = (x-1)/(x+1)
y = (c(u*u)+u) / (c(u*u)-u)
based on
sqrt(x)=sqrt(1+u)/sqrt(1-u)
and
c(v) = 1+sqrt(1-v) = 2 - 1/2*v - 1/8*v^2 - 1/16*v^3 - 5/128*v^4 - 7/256*v^5 - 21/1024*v^6 - 33/2048*v^7 - ...
In a program without bit manipulations this could look like
double my_sqrt(double x) {
double c,u,v,y,scale=1;
int k=0;
if(x<0) return NaN;
while(x>2 ) { x/=4; scale *=2; k++; }
while(x<0.5) { x*=4; scale /=2; k--; }
// rational approximation of sqrt
u = (x-1)/(x+1);
v = u*u;
c = 2 - v/2*(1 + v/4*(1 + v/2));
y = 1 + 2*u/(c-u); // = (c+u)/(c-u);
// one Halley iteration
y = y*(1+8*x/(3*(3*y*y+x))) // = y*(y*y+3*x)/(3*y*y+x)
// reconstruct original scale
return y*scale;
}
One could replace the Halley step with two Newton steps, or
with a better uniform approximation in c one could replace the Halley step with one Newton step, or ...
How to efficiently save and access a large array of 5 bit numbers in memory?
For example
01100
01101
01110
01111
10000
10001
which I will later convert to a byte to check what number it is?
I was thinking of just using an array of bytes but after a while this will be wasting a lot of memory as this will be a continually growing array. Also I will want to save this array efficiently. I will only be using exactly 5 bits.
This is the code that I use for a bit array implementation in C, in JAVA it's going to be the same, I must reconsider what I said about the list, maybe an array is going to be better.
Anyway, you consider the array as a contiguous segments of bits. Those functions set, get, and read the k-th bit of the array. In this case I'm using an array of integers, so you see '32', is you use an array of bytes, then you'd use '8'.
void set_bit(int a[], int k)
{
int i = k / 32;
int pos = k % 32;
unsigned int flag = 1; // flag = 0000....00001
flag = flag << pos; // flag = 0000...00100..0000
a[i] = a[i] | flag; // set the bit at the k-th position in a[i]
}
void clear_bit(int a[], int k)
{
int i = k / 32;
int pos = k % 32;
unsigned int flag = 1; // flag = 0000....00001
flag = flag << pos; // flag = 0000...00100..0000
flag = ~flag;
a[i] = a[i] & flag; // set the bit at the k-th position in a[i]
}
int test_bit(int a[], int k)
{
int i = k / 32;
int pos = k % 32;
unsigned int flag = 1; // flag = 0000....00001
flag = flag << pos; // flag = 0000...00100..0000
if (a[i] & flag) // test the k-th bit of a to be 1
return 1;
else
return 0;
}
I don't know how you store the five bits number, you'll have to insert them bit by bit, and also keep track of the last empty position in the bit array.
"I was thinking of just using an array of bytes but after a while this will be wasting a lot of memory as this will be a continually growing array."
I've dealt with a similar problem and decided to write a file based BitInputStream and a BitOutputSteam. Therefore running out of memory was no longer an issue. Please note that the given links are not my work but good examples of how to write a bit input/output stream.
I wrote an implementation of a 5-bit byte vector on top of an 8-bit byte vector in Javascript some time ago that might be of some help.
const ByteVector = require('bytevector');
class FiveBuffer {
constructor(buffer = [0], bitsAvailable = 8) {
this.buf = new ByteVector(buffer);
this.bitsAvailable = bitsAvailable;
this.size = Math.floor(((this.byteSize() * 8) - this.bitsAvailable) / 5);
}
push(num) {
if (num > 31 || num < 0)
throw new Error(`Only 5-bit unsigned integers (${num} not among them) are accepted`);
var firstShift = 5 - this.bitsAvailable;
var secondShift = this.bitsAvailable + 3;
var firstShifted = shiftRight(num, firstShift);
var backIdx = this.buf.length - 1;
var back = this.buf.get(backIdx);
this.buf.set(backIdx, back | firstShifted);
if (secondShift < 8) {
var secondShifted = num << secondShift;
this.buf.push(secondShifted);
}
this.bitsAvailable = secondShift % 8;
this.size++;
}
get(idx) {
if (idx > this.size)
throw new Error(`Index ${idx} is out of bounds for FiveBuffer of size ${this.size}`);
var bitIdx = idx * 5;
var byteIdx = Math.floor(bitIdx / 8);
var byte = this.buf.get(byteIdx);
var bit = bitIdx % 8;
var firstShift = 3 - bit;
var firstShifted = shiftRightDestroy(byte, firstShift);
var final = firstShifted;
var secondShift = 11 - bit;
if (secondShift < 8) {
var secondShifted = this.buf.get(byteIdx + 1) >> secondShift;
final = final | secondShifted;
}
return final;
}
buffer() {
this.buf.shrink_to_fit();
return this.buf.buffer();
}
debug() {
var arr = [];
this.buffer().forEach(x => arr.push(x.toString(2)));
console.log(arr);
}
byteSize() {
return this.buf.size();
}
}
function shiftRightDestroy(num, bits) {
var left = 3 - bits;
var res = (left > 0) ? ((num << left) % 256) >> left : num;
return shiftRight(res, bits);
}
function shiftRight(num, bits) {
return (bits < 0) ?
num << -bits :
num >> bits;
}
module.exports = FiveBuffer;
I am trying to port Microsoft's Decompress Algorithm to PHP from Java(or maybe its C++ or C# since that's Microsoft). This is an algorithm that takes their compressed shape data from their Bing Maps Geodata API results and expands it into lat/lon coordinates. They have posted their algorithm on their site over at https://msdn.microsoft.com/en-us/library/dn306801.aspx
I have a list of coordinates stored in my database and I am trying to retrieve the array of coordinates that define a polygon to work with the shape. My results differ. Can anyone point out discrepancies between the two?
EDIT: I believe my problem lies in the fact that PHP does not handle LONG type integers and precision loss occurs when doing bitwise operations. I might need to convert some operations to use BCMath. Help here?
Decompression Algorithm (Microsoft's)
public const string safeCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-";
private static bool TryParseEncodedValue(string value, out List<Coordinate> parsedValue)
{
parsedValue = null;
var list = new List<Coordinate>();
int index = 0;
int xsum = 0, ysum = 0;
while (index < value.Length) // While we have more data,
{
long n = 0; // initialize the accumulator
int k = 0; // initialize the count of bits
while (true)
{
if (index >= value.Length) // If we ran out of data mid-number
return false; // indicate failure.
int b = safeCharacters.IndexOf(value[index++]);
if (b == -1) // If the character wasn't on the valid list,
return false; // indicate failure.
n |= ((long)b & 31) << k; // mask off the top bit and append the rest to the accumulator
k += 5; // move to the next position
if (b < 32) break; // If the top bit was not set, we're done with this number.
}
// The resulting number encodes an x, y pair in the following way:
//
// ^ Y
// |
// 14
// 9 13
// 5 8 12
// 2 4 7 11
// 0 1 3 6 10 ---> X
// determine which diagonal it's on
int diagonal = (int)((Math.Sqrt(8 * n + 5) - 1) / 2);
// subtract the total number of points from lower diagonals
n -= diagonal * (diagonal + 1L) / 2;
// get the X and Y from what's left over
int ny = (int)n;
int nx = diagonal - ny;
// undo the sign encoding
nx = (nx >> 1) ^ -(nx & 1);
ny = (ny >> 1) ^ -(ny & 1);
// undo the delta encoding
xsum += nx;
ysum += ny;
// position the decimal point
list.Add(new Coordinate { Latitude = ysum * 0.00001, Longitude = xsum * 0.00001 });
}
parsedValue = list;
return true;
}
My Decompression Algorithm (PHP)
function tryParseEncodedValue($value) {
$value = 'vx1vilihnM6hR7mEl2Q';
var_error_log($value);
$safeCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-";
$list = array();
$index = 0;
(int)$xsum = 0;
(int)$ysum = 0;
while ($index < strlen($value)) // While we have more data,
{
$n = 0; // initialize the accumulator
$k = 0; // initialize the count of bits
while (true)
{
if ($index >= strlen($value)) // If we ran out of data mid-number
{
var_error_log('failed: inxed >= strlen($value)');
return false; // indicate failure.
}
(int)$b = strpos($safeCharacters, $value[$index++]);
if (!$b) { // If the character wasn't on the valid list,
var_error_log('failed: character not in valid list');
return false; // indicate failure.
}
$n |= ($b & 31) << $k; // mask off the top bit and append the rest to the accumulator
$k = $k+5; // move to the next position
if ($b < 32) break; // If the top bit was not set, we're done with this number.
}
// The resulting number encodes an x, y pair in the following way:
//
// ^ Y
// |
// 14
// 9 13
// 5 8 12
// 2 4 7 11
// 0 1 3 6 10 ---> X
// determine which diagonal it's on
$diagonal = (int)((sqrt(8 * $n + 5) - 1) / 2);
// subtract the total number of points from lower diagonals
$n -= $diagonal * ($diagonal + (int)1) / 2;
// get the X and Y from what's left over
$ny = (int)$n;
$nx = $diagonal - $ny;
// undo the sign encoding
$nx = pow(($nx >> 1), (-($nx & 1)) );
$ny = pow(($ny >> 1), (-($ny & 1)) );
// undo the delta encoding
$xsum += $nx;
$ysum += $ny;
// position the decimal point
$coordinates = array($ysum * 0.00001, $xsum * 0.00001);
array_push($list, $coordinates);
}
$parsedValue = $list;
var_error_log($parsedValue);
return $parsedValue;
}
Known Input
Microsoft gives an example input and output to validate your algorithms with. https://msdn.microsoft.com/en-us/library/jj158958.aspx#TestingYourAlg
compressed shape = 'vx1vilihnM6hR7mEl2Q'
Expected Output
an array of coordinates
35.894309002906084, -110.72522000409663
35.893930979073048, -110.72577999904752
35.893744984641671, -110.72606003843248
35.893366960808635, -110.72661500424147
My Output
array(4) {
[0]=>
array(2) {
[0]=>
float(1.0E-5)
[1]=>
float(1.0E-5)
}
[1]=>
array(2) {
[0]=>
float(1.027027027027E-5)
[1]=>
float(1.0181818181818E-5)
}
[2]=>
array(2) {
[0]=>
float(1.0825825825826E-5)
[1]=>
float(1.0552188552189E-5)
}
[3]=>
array(2) {
[0]=>
float(1.1103603603604E-5)
[1]=>
float(1.0734006734007E-5)
}
}
So, we can see that the PHP output is not being calculated correctly and I have a feeling it has to do with the differences with casting to Long integers in Java and running bitwise operations on integers. PHP is supposed to handle integers whether they are long or floats or ints, but I have a feeling I am overlooking something.
I bet the problem has to do with this line. Can anyone point out discrepancies?
n |= ((long)b & 31) << k; // mask off the top bit and append the rest to the accumulator
I suspect your issue is when you converted the following C# code:
nx = (nx >> 1) ^ -(nx & 1);
ny = (ny >> 1) ^ -(ny & 1);
In your PHP code you convert this to:
$nx = pow(($nx >> 1), (-($nx & 1)) );
$ny = pow(($ny >> 1), (-($ny & 1)) );
In C# ^ is a bitwise XOR operation and not a power. PHP uses the same symbol for a bitwise XOR, so try changing you code to this:
$nx = ($nx >> 1) ^ (-($nx & 1));
$ny = ($ny >> 1) ^ (-($ny & 1));
I have converted the C# code to PHP. The problem did lie in the fact that precision was lost with large floating numbers in php. Since some of the values were going above the bounds of 32bit integers and were being stored as 64bit ints in C#, these values had to be converted to PHP's GMP class. GMP supports long bitwise operations.
/*
* Microsoft's decompression algorithm - php version
* returns an array of coordinates (pairs of doubles)
*/
function tryParseEncodedValue($value) {
$safeCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-";
$list = array();
(int)$index = 0;
(int)$xsum = 0;
(int)$ysum = 0;
while ($index < strlen($value)) // While we have more data,
{
$n = 0; // initialize the accumulator
$k = 0; // initialize the count of bits
while (true)
{
if ($index >= strlen($value)) // If we ran out of data mid-number
{
var_error_log('failed: inxed >= strlen($value)');
return false; // indicate failure.
}
$b = strpos($safeCharacters, $value[$index++]);
if ($b === false) { // If the character wasn't on the valid list,
var_error_log('failed: character not in valid list');
return false; // indicate failure.
}
// mask off the top bit and append the rest to the accumulator
// n |= ((long)b & 31) << k;
$bgmp = gmp_init($b); // Here i'm breaking out this function
$bitwiseand = gmp_and($bgmp, 31); // on multiple lines because there's
$shifted = gmp_shiftl($bitwiseand, $k); // so many steps
$n = gmp_or($n, $shifted);
$k += 5;
if (gmp_cmp($bgmp, gmp_init(32)) < 0) break; // gmp compare: b < 32
}
// The resulting number encodes an x, y pair in the following way:
//
// ^ Y
// |
// 14
// 9 13
// 5 8 12
// 2 4 7 11
// 0 1 3 6 10 ---> X
// determine which diagonal it's on
//$diagonal = (int)((sqrt(8 * $n + 5) - 1) / 2);
$diagonal = gmp_intval(gmp_div_q(gmp_sub(gmp_sqrt(gmp_add(gmp_mul($n, 8), 5)), 1), 2));
// subtract the total number of points from lower diagonals
// n -= diagonal * (diagonal + 1L) / 2;
$n = gmp_sub($n, gmp_div_q(gmp_mul($diagonal, gmp_add($diagonal, 1)), 2));
// get the X and Y from what's left over
(int)$ny = gmp_intval($n);
(int)$nx = $diagonal - $ny;
// undo the sign encoding
$nx = ($nx >> 1)^ (-($nx & 1));
$ny = ($ny >> 1)^ (-($ny & 1));
// undo the delta encoding
$xsum += $nx;
$ysum += $ny;
// position the decimal point
$coordinate = array($ysum * 0.00001, $xsum * 0.00001);
array_push($list, $coordinate);
}
return $list;
}
// shift left, $x number to shift, $n shift n times.
function gmp_shiftl($x,$n) {
return(gmp_mul($x,gmp_pow(2,$n)));
}
Here's what I'm working with right now:
for (int i = 0, numSamples = soundBytes.length / 2; i < numSamples; i += 2)
{
// Get the samples.
int sample1 = ((soundBytes[i] & 0xFF) << 8) | (soundBytes[i + 1] & 0xFF); // Automatically converts to unsigned int 0...65535
int sample2 = ((outputBytes[i] & 0xFF) << 8) | (outputBytes[i + 1] & 0xFF); // Automatically converts to unsigned int 0...65535
// Normalize for simplicity.
float normalizedSample1 = sample1 / 65535.0f;
float normalizedSample2 = sample2 / 65535.0f;
float normalizedMixedSample = 0.0f;
// Apply the algorithm.
if (normalizedSample1 < 0.5f && normalizedSample2 < 0.5f)
normalizedMixedSample = 2.0f * normalizedSample1 * normalizedSample2;
else
normalizedMixedSample = 2.0f * (normalizedSample1 + normalizedSample2) - (2.0f * normalizedSample1 * normalizedSample2) - 1.0f;
int mixedSample = (int)(normalizedMixedSample * 65535);
// Replace the sample in soundBytes array with this mixed sample.
soundBytes[i] = (byte)((mixedSample >> 8) & 0xFF);
soundBytes[i + 1] = (byte)(mixedSample & 0xFF);
}
From as far as I can tell, it's an accurate representation of the algorithm defined on this page: http://www.vttoth.com/CMS/index.php/technical-notes/68
However, just mixing a sound with silence (all 0's) results in a sound that very obviously doesn't sound right, maybe it's best to describe it as higher-pitched and louder.
Would appreciate help in determining if I'm implementing the algorithm correctly, or if I simply need to go about it a different way (different algorithm/method)?
In the linked article the author assumes A and B to represent entire streams of audio. More specifically X means the maximum abs value of all of the samples in stream X - where X is either A or B. So what his algorithm does is scans the entirety of both streams to compute the max abs sample of each and then scales things so that the output theoretically peaks at 1.0. You'll need to make multiple passes over the data in order to implement this algorithm and if your data is streaming in then it simply will not work.
Here is an example of how I think the algorithm to work. It assumes that the samples have already been converted to floating point to side step the issue of your conversion code being wrong. I'll explain what is wrong with it later:
double[] samplesA = ConvertToDoubles(samples1);
double[] samplesB = ConvertToDoubles(samples2);
double A = ComputeMax(samplesA);
double B = ComputeMax(samplesB);
// Z always equals 1 which is an un-useful bit of information.
double Z = A+B-A*B;
// really need to find a value x such that xA+xB=1, which I think is:
double x = 1 / (Math.sqrt(A) * Math.sqrt(B));
// Now mix and scale the samples
double[] samples = MixAndScale(samplesA, samplesB, x);
Mixing and scaling:
double[] MixAndScale(double[] samplesA, double[] samplesB, double scalingFactor)
{
double[] result = new double[samplesA.length];
for (int i = 0; i < samplesA.length; i++)
result[i] = scalingFactor * (samplesA[i] + samplesB[i]);
}
Computing the max peak:
double ComputeMaxPeak(double[] samples)
{
double max = 0;
for (int i = 0; i < samples.length; i++)
{
double x = Math.abs(samples[i]);
if (x > max)
max = x;
}
return max;
}
And conversion. Notice how I'm using short so that the sign bit is properly maintained:
double[] ConvertToDouble(byte[] bytes)
{
double[] samples = new double[bytes.length/2];
for (int i = 0; i < samples.length; i++)
{
short tmp = ((short)bytes[i*2])<<8 + ((short)(bytes[i*2+1]);
samples[i] = tmp / 32767.0;
}
return samples;
}
This question is usually asked as a part of another question but it turns out that the answer is long. I've decided to answer it here so I can link to it elsewhere.
Although I'm not aware of a way that Java can produce audio samples for us at this time, if that changes in the future, this can be a place for it. I know that JavaFX has some stuff like this, for example AudioSpectrumListener, but still not a way to access samples directly.
I'm using javax.sound.sampled for playback and/or recording but I'd like to do something with the audio.
Perhaps I'd like to display it visually or process it in some way.
How do I access audio sample data to do that with Java Sound?
See also:
Java Sound Tutorials (Official)
Java Sound Resources (Unofficial)
Well, the simplest answer is that at the moment Java can't produce sample data for the programmer.
This quote is from the official tutorial:
There are two ways to apply signal processing:
You can use any processing supported by the mixer or its component lines, by querying for Control objects and then setting the controls as the user desires. Typical controls supported by mixers and lines include gain, pan, and reverberation controls.
If the kind of processing you need isn't provided by the mixer or its lines, your program can operate directly on the audio bytes, manipulating them as desired.
This page discusses the first technique in greater detail, because there is no special API for the second technique.
Playback with javax.sound.sampled largely acts as a bridge between the file and the audio device. The bytes are read in from the file and sent off.
Don't assume the bytes are meaningful audio samples! Unless you happen to have an 8-bit AIFF file, they aren't. (On the other hand, if the samples are definitely 8-bit signed, you can do arithmetic with them. Using 8-bit is one way to avoid the complexity described here, if you're just playing around.)
So instead, I'll enumerate the types of AudioFormat.Encoding and describe how to decode them yourself. This answer will not cover how to encode them, but it's included in the complete code example at the bottom. Encoding is mostly just the decoding process in reverse.
This is a long answer but I wanted to give a thorough overview.
A Little About Digital Audio
Generally when digital audio is explained, we're referring to Linear Pulse-Code Modulation (LPCM).
A continuous sound wave is sampled at regular intervals and the amplitudes are quantized to integers of some scale.
Shown here is a sine wave sampled and quantized to 4-bit:
(Notice that the most positive value in two's complement representation is 1 less than the most negative value. This is a minor detail to be aware of. For example if you're clipping audio and forget this, the positive clips will overflow.)
When we have audio on the computer, we have an array of these samples. A sample array is what we want to turn the byte array in to.
To decode PCM samples, we don't care much about the sample rate or number of channels, so I won't be saying much about them here. Channels are usually interleaved, so that if we had an array of them, they'd be stored like this:
Index 0: Sample 0 (Left Channel)
Index 1: Sample 0 (Right Channel)
Index 2: Sample 1 (Left Channel)
Index 3: Sample 1 (Right Channel)
Index 4: Sample 2 (Left Channel)
Index 5: Sample 2 (Right Channel)
...
In other words, for stereo, the samples in the array just alternate between left and right.
Some Assumptions
All of the code examples will assume the following declarations:
byte[] bytes; The byte array, read from the AudioInputStream.
float[] samples; The output sample array that we're going to fill.
float sample; The sample we're currently working on.
long temp; An interim value used for general manipulation.
int i; The position in the byte array where the current sample's data starts.
We'll normalize all of the samples in our float[] array to the range of -1f <= sample <= 1f. All of the floating-point audio I've seen comes this way and it's pretty convenient.
If our source audio doesn't already come like that (as is for e.g. integer samples), we can normalize them ourselves using the following:
sample = sample / fullScale(bitsPerSample);
Where fullScale is 2bitsPerSample - 1, i.e. Math.pow(2, bitsPerSample-1).
How do I coerce the byte array in to meaningful data?
The byte array contains the sample frames split up and all in a line. This is actually very straight-forward except for something called endianness, which is the ordering of the bytes in each sample packet.
Here's a diagram. This sample (packed in to a byte array) holds the decimal value 9999:
24-bit sample as big-endian:
bytes[i] bytes[i + 1] bytes[i + 2]
┌──────┐ ┌──────┐ ┌──────┐
00000000 00100111 00001111
24-bit sample as little-endian:
bytes[i] bytes[i + 1] bytes[i + 2]
┌──────┐ ┌──────┐ ┌──────┐
00001111 00100111 00000000
They hold the same binary values; however, the byte orders are reversed.
In big-endian, the more significant bytes come before the less significant bytes.
In little-endian, the less significant bytes come before the more significant bytes.
WAV files are stored in little-endian order and AIFF files are stored in big-endian order. Endianness can be obtained from AudioFormat.isBigEndian.
To concatenate the bytes and put them in to our long temp variable, we:
Bitwise AND each byte with the mask 0xFF (which is 0b1111_1111) to avoid sign-extension when the byte is automatically promoted. (char, byte and short are promoted to int when arithmetic is performed on them.) See also What does value & 0xff do in Java?
Bit shift each byte in to position.
Bitwise OR the bytes together.
Here's a 24-bit example:
long temp;
if (isBigEndian) {
temp = (
((bytes[i ] & 0xffL) << 16)
| ((bytes[i + 1] & 0xffL) << 8)
| (bytes[i + 2] & 0xffL)
);
} else {
temp = (
(bytes[i ] & 0xffL)
| ((bytes[i + 1] & 0xffL) << 8)
| ((bytes[i + 2] & 0xffL) << 16)
);
}
Notice that the shift order is reversed based on endianness.
This can also be generalized to a loop, which can be seen in the full code at the bottom of this answer. (See the unpackAnyBit and packAnyBit methods.)
Now that we have the bytes concatenated together, we can take a few more steps to turn them in to a sample. The next steps depend on the actual encoding.
How do I decode Encoding.PCM_SIGNED?
The two's complement sign must be extended. This means that if the most significant bit (MSB) is set to 1, we fill all the bits above it with 1s. The arithmetic right-shift (>>) will do the filling for us automatically if the sign bit is set, so I usually do it this way:
int bitsToExtend = Long.SIZE - bitsPerSample;
float sample = (temp << bitsToExtend) >> bitsToExtend.
(Where Long.SIZE is 64. If our temp variable wasn't a long, we'd use something else. If we used e.g. int temp instead, we'd use 32.)
To understand how this works, here's a diagram of sign-extending 8-bit to 16-bit:
11111111 is the byte value -1, but the upper bits of the short are 0.
Shift the byte's MSB in to the MSB position of the short.
0000 0000 1111 1111
<< 8
───────────────────
1111 1111 0000 0000
Shift it back and the right-shift fills all the upper bits with 1s.
We now have the short value of -1.
1111 1111 0000 0000
>> 8
───────────────────
1111 1111 1111 1111
Positive values (that had a 0 in the MSB) are left unchanged. This is a nice property of the arithmetic right-shift.
Then normalize the sample, as described in Some Assumptions.
You might not need to write explicit sign-extension if your code is simple
Java does sign-extension automatically when converting from one integral type to a larger type, for example byte to int. If you know that your input and output format are always signed, you can use the automatic sign-extension while concatenating bytes in the earlier step.
Recall from the section above (How do I coerce the byte array in to meaningful data?) that we used b & 0xFF to prevent sign-extension from occurring. If you just remove the & 0xFF from the highest byte, sign-extension will happen automatically.
For example, the following decodes signed, big-endian, 16-bit samples:
for (int i = 0; i < bytes.length; i++) {
int sample = (bytes[i] << 8) // high byte is sign-extended
| (bytes[i + 1] & 0xFF); // low byte is not
// ...
}
How do I decode Encoding.PCM_UNSIGNED?
We turn it in to a signed number. Unsigned samples are simply offset so that, for example:
An unsigned value of 0 corresponds to the most negative signed value.
An unsigned value of 2bitsPerSample - 1 corresponds to the signed value of 0.
An unsigned value of 2bitsPerSample corresponds to the most positive signed value.
So this turns out to be pretty simple. Just subtract the offset:
float sample = temp - fullScale(bitsPerSample);
Then normalize the sample, as described in Some Assumptions.
How do I decode Encoding.PCM_FLOAT?
This is new since Java 7.
In practice, floating-point PCM is typically either IEEE 32-bit or IEEE 64-bit and already normalized to the range of ±1.0. The samples can be obtained with the utility methods Float#intBitsToFloat and Double#longBitsToDouble.
// IEEE 32-bit
float sample = Float.intBitsToFloat((int) temp);
// IEEE 64-bit
double sampleAsDouble = Double.longBitsToDouble(temp);
float sample = (float) sampleAsDouble; // or just use double for arithmetic
How do I decode Encoding.ULAW and Encoding.ALAW?
These are companding compression codecs that are more common in telephones and such. They're supported by javax.sound.sampled I assume because they're used by Sun's Au format. (However, it's not limited to just this type of container. For example, WAV can contain these encodings.)
You can conceptualize A-law and μ-law like they're a floating-point format. These are PCM formats but the range of values is non-linear.
There are two ways to decode them. I'll show the way which uses the mathematical formula. You can also decode them by manipulating the binary directly which is described in this blog post but it's more esoteric-looking.
For both, the compressed data is 8-bit. Standardly A-law is 13-bit when decoded and μ-law is 14-bit when decoded; however, applying the formula yields a range of ±1.0.
Before you can apply the formula, there are three things to do:
Some of the bits are standardly inverted for storage due to reasons involving data integrity.
They're stored as sign and magnitude (rather than two's complement).
The formula also expects a range of ±1.0, so the 8-bit value has to be scaled.
For μ-law all the bits are inverted, so:
temp ^= 0xffL; // 0xff == 0b1111_1111
(Note that we can't use ~, because we don't want to invert the high bits of the long.)
For A-law, every other bit is inverted, so:
temp ^= 0x55L; // 0x55 == 0b0101_0101
(XOR can be used to do inversion. See How do you set, clear and toggle a bit?)
To convert from sign and magnitude to two's complement, we:
Check to see if the sign bit was set.
If so, clear the sign bit and negate the number.
// 0x80 == 0b1000_0000
if ((temp & 0x80L) != 0) {
temp ^= 0x80L;
temp = -temp;
}
Then scale the encoded numbers, the same way as described in Some Assumptions:
sample = temp / fullScale(8);
Now we can apply the expansion.
The μ-law formula translated to Java is then:
sample = (float) (
signum(sample)
*
(1.0 / 255.0)
*
(pow(256.0, abs(sample)) - 1.0)
);
The A-law formula translated to Java is then:
float signum = signum(sample);
sample = abs(sample);
if (sample < (1.0 / (1.0 + log(87.7)))) {
sample = (float) (
sample * ((1.0 + log(87.7)) / 87.7)
);
} else {
sample = (float) (
exp((sample * (1.0 + log(87.7))) - 1.0) / 87.7
);
}
sample = signum * sample;
Here's the full example code for the SimpleAudioConversion class.
package mcve.audio;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioFormat.Encoding;
import static java.lang.Math.*;
/**
* <p>Performs simple audio format conversion.</p>
*
* <p>Example usage:</p>
*
* <pre>{#code AudioInputStream ais = ... ;
* SourceDataLine line = ... ;
* AudioFormat fmt = ... ;
*
* // do setup
*
* for (int blen = 0; (blen = ais.read(bytes)) > -1;) {
* int slen;
* slen = SimpleAudioConversion.decode(bytes, samples, blen, fmt);
*
* // do something with samples
*
* blen = SimpleAudioConversion.encode(samples, bytes, slen, fmt);
* line.write(bytes, 0, blen);
* }}</pre>
*
* #author Radiodef
* #see Overview on Stack Overflow
*/
public final class SimpleAudioConversion {
private SimpleAudioConversion() {}
/**
* Converts from a byte array to an audio sample float array.
*
* #param bytes the byte array, filled by the AudioInputStream
* #param samples an array to fill up with audio samples
* #param blen the return value of AudioInputStream.read
* #param fmt the source AudioFormat
*
* #return the number of valid audio samples converted
*
* #throws NullPointerException if bytes, samples or fmt is null
* #throws ArrayIndexOutOfBoundsException
* if bytes.length is less than blen or
* if samples.length is less than blen / bytesPerSample(fmt.getSampleSizeInBits())
*/
public static int decode(byte[] bytes,
float[] samples,
int blen,
AudioFormat fmt) {
int bitsPerSample = fmt.getSampleSizeInBits();
int bytesPerSample = bytesPerSample(bitsPerSample);
boolean isBigEndian = fmt.isBigEndian();
Encoding encoding = fmt.getEncoding();
double fullScale = fullScale(bitsPerSample);
int i = 0;
int s = 0;
while (i < blen) {
long temp = unpackBits(bytes, i, isBigEndian, bytesPerSample);
float sample = 0f;
if (encoding == Encoding.PCM_SIGNED) {
temp = extendSign(temp, bitsPerSample);
sample = (float) (temp / fullScale);
} else if (encoding == Encoding.PCM_UNSIGNED) {
temp = unsignedToSigned(temp, bitsPerSample);
sample = (float) (temp / fullScale);
} else if (encoding == Encoding.PCM_FLOAT) {
if (bitsPerSample == 32) {
sample = Float.intBitsToFloat((int) temp);
} else if (bitsPerSample == 64) {
sample = (float) Double.longBitsToDouble(temp);
}
} else if (encoding == Encoding.ULAW) {
sample = bitsToMuLaw(temp);
} else if (encoding == Encoding.ALAW) {
sample = bitsToALaw(temp);
}
samples[s] = sample;
i += bytesPerSample;
s++;
}
return s;
}
/**
* Converts from an audio sample float array to a byte array.
*
* #param samples an array of audio samples to encode
* #param bytes an array to fill up with bytes
* #param slen the return value of the decode method
* #param fmt the destination AudioFormat
*
* #return the number of valid bytes converted
*
* #throws NullPointerException if samples, bytes or fmt is null
* #throws ArrayIndexOutOfBoundsException
* if samples.length is less than slen or
* if bytes.length is less than slen * bytesPerSample(fmt.getSampleSizeInBits())
*/
public static int encode(float[] samples,
byte[] bytes,
int slen,
AudioFormat fmt) {
int bitsPerSample = fmt.getSampleSizeInBits();
int bytesPerSample = bytesPerSample(bitsPerSample);
boolean isBigEndian = fmt.isBigEndian();
Encoding encoding = fmt.getEncoding();
double fullScale = fullScale(bitsPerSample);
int i = 0;
int s = 0;
while (s < slen) {
float sample = samples[s];
long temp = 0L;
if (encoding == Encoding.PCM_SIGNED) {
temp = (long) (sample * fullScale);
} else if (encoding == Encoding.PCM_UNSIGNED) {
temp = (long) (sample * fullScale);
temp = signedToUnsigned(temp, bitsPerSample);
} else if (encoding == Encoding.PCM_FLOAT) {
if (bitsPerSample == 32) {
temp = Float.floatToRawIntBits(sample);
} else if (bitsPerSample == 64) {
temp = Double.doubleToRawLongBits(sample);
}
} else if (encoding == Encoding.ULAW) {
temp = muLawToBits(sample);
} else if (encoding == Encoding.ALAW) {
temp = aLawToBits(sample);
}
packBits(bytes, i, temp, isBigEndian, bytesPerSample);
i += bytesPerSample;
s++;
}
return i;
}
/**
* Computes the block-aligned bytes per sample of the audio format,
* using Math.ceil(bitsPerSample / 8.0).
* <p>
* Round towards the ceiling because formats that allow bit depths
* in non-integral multiples of 8 typically pad up to the nearest
* integral multiple of 8. So for example, a 31-bit AIFF file will
* actually store 32-bit blocks.
*
* #param bitsPerSample the return value of AudioFormat.getSampleSizeInBits
* #return The block-aligned bytes per sample of the audio format.
*/
public static int bytesPerSample(int bitsPerSample) {
return (int) ceil(bitsPerSample / 8.0); // optimization: ((bitsPerSample + 7) >>> 3)
}
/**
* Computes the largest magnitude representable by the audio format,
* using Math.pow(2.0, bitsPerSample - 1). Note that for two's complement
* audio, the largest positive value is one less than the return value of
* this method.
* <p>
* The result is returned as a double because in the case that
* bitsPerSample is 64, a long would overflow.
*
* #param bitsPerSample the return value of AudioFormat.getBitsPerSample
* #return the largest magnitude representable by the audio format
*/
public static double fullScale(int bitsPerSample) {
return pow(2.0, bitsPerSample - 1); // optimization: (1L << (bitsPerSample - 1))
}
private static long unpackBits(byte[] bytes,
int i,
boolean isBigEndian,
int bytesPerSample) {
switch (bytesPerSample) {
case 1: return unpack8Bit(bytes, i);
case 2: return unpack16Bit(bytes, i, isBigEndian);
case 3: return unpack24Bit(bytes, i, isBigEndian);
default: return unpackAnyBit(bytes, i, isBigEndian, bytesPerSample);
}
}
private static long unpack8Bit(byte[] bytes, int i) {
return bytes[i] & 0xffL;
}
private static long unpack16Bit(byte[] bytes,
int i,
boolean isBigEndian) {
if (isBigEndian) {
return (
((bytes[i ] & 0xffL) << 8)
| (bytes[i + 1] & 0xffL)
);
} else {
return (
(bytes[i ] & 0xffL)
| ((bytes[i + 1] & 0xffL) << 8)
);
}
}
private static long unpack24Bit(byte[] bytes,
int i,
boolean isBigEndian) {
if (isBigEndian) {
return (
((bytes[i ] & 0xffL) << 16)
| ((bytes[i + 1] & 0xffL) << 8)
| (bytes[i + 2] & 0xffL)
);
} else {
return (
(bytes[i ] & 0xffL)
| ((bytes[i + 1] & 0xffL) << 8)
| ((bytes[i + 2] & 0xffL) << 16)
);
}
}
private static long unpackAnyBit(byte[] bytes,
int i,
boolean isBigEndian,
int bytesPerSample) {
long temp = 0;
if (isBigEndian) {
for (int b = 0; b < bytesPerSample; b++) {
temp |= (bytes[i + b] & 0xffL) << (
8 * (bytesPerSample - b - 1)
);
}
} else {
for (int b = 0; b < bytesPerSample; b++) {
temp |= (bytes[i + b] & 0xffL) << (8 * b);
}
}
return temp;
}
private static void packBits(byte[] bytes,
int i,
long temp,
boolean isBigEndian,
int bytesPerSample) {
switch (bytesPerSample) {
case 1: pack8Bit(bytes, i, temp);
break;
case 2: pack16Bit(bytes, i, temp, isBigEndian);
break;
case 3: pack24Bit(bytes, i, temp, isBigEndian);
break;
default: packAnyBit(bytes, i, temp, isBigEndian, bytesPerSample);
break;
}
}
private static void pack8Bit(byte[] bytes, int i, long temp) {
bytes[i] = (byte) (temp & 0xffL);
}
private static void pack16Bit(byte[] bytes,
int i,
long temp,
boolean isBigEndian) {
if (isBigEndian) {
bytes[i ] = (byte) ((temp >>> 8) & 0xffL);
bytes[i + 1] = (byte) ( temp & 0xffL);
} else {
bytes[i ] = (byte) ( temp & 0xffL);
bytes[i + 1] = (byte) ((temp >>> 8) & 0xffL);
}
}
private static void pack24Bit(byte[] bytes,
int i,
long temp,
boolean isBigEndian) {
if (isBigEndian) {
bytes[i ] = (byte) ((temp >>> 16) & 0xffL);
bytes[i + 1] = (byte) ((temp >>> 8) & 0xffL);
bytes[i + 2] = (byte) ( temp & 0xffL);
} else {
bytes[i ] = (byte) ( temp & 0xffL);
bytes[i + 1] = (byte) ((temp >>> 8) & 0xffL);
bytes[i + 2] = (byte) ((temp >>> 16) & 0xffL);
}
}
private static void packAnyBit(byte[] bytes,
int i,
long temp,
boolean isBigEndian,
int bytesPerSample) {
if (isBigEndian) {
for (int b = 0; b < bytesPerSample; b++) {
bytes[i + b] = (byte) (
(temp >>> (8 * (bytesPerSample - b - 1))) & 0xffL
);
}
} else {
for (int b = 0; b < bytesPerSample; b++) {
bytes[i + b] = (byte) ((temp >>> (8 * b)) & 0xffL);
}
}
}
private static long extendSign(long temp, int bitsPerSample) {
int bitsToExtend = Long.SIZE - bitsPerSample;
return (temp << bitsToExtend) >> bitsToExtend;
}
private static long unsignedToSigned(long temp, int bitsPerSample) {
return temp - (long) fullScale(bitsPerSample);
}
private static long signedToUnsigned(long temp, int bitsPerSample) {
return temp + (long) fullScale(bitsPerSample);
}
// mu-law constant
private static final double MU = 255.0;
// A-law constant
private static final double A = 87.7;
// natural logarithm of A
private static final double LN_A = log(A);
private static float bitsToMuLaw(long temp) {
temp ^= 0xffL;
if ((temp & 0x80L) != 0) {
temp = -(temp ^ 0x80L);
}
float sample = (float) (temp / fullScale(8));
return (float) (
signum(sample)
*
(1.0 / MU)
*
(pow(1.0 + MU, abs(sample)) - 1.0)
);
}
private static long muLawToBits(float sample) {
double sign = signum(sample);
sample = abs(sample);
sample = (float) (
sign * (log(1.0 + (MU * sample)) / log(1.0 + MU))
);
long temp = (long) (sample * fullScale(8));
if (temp < 0) {
temp = -temp ^ 0x80L;
}
return temp ^ 0xffL;
}
private static float bitsToALaw(long temp) {
temp ^= 0x55L;
if ((temp & 0x80L) != 0) {
temp = -(temp ^ 0x80L);
}
float sample = (float) (temp / fullScale(8));
float sign = signum(sample);
sample = abs(sample);
if (sample < (1.0 / (1.0 + LN_A))) {
sample = (float) (sample * ((1.0 + LN_A) / A));
} else {
sample = (float) (exp((sample * (1.0 + LN_A)) - 1.0) / A);
}
return sign * sample;
}
private static long aLawToBits(float sample) {
double sign = signum(sample);
sample = abs(sample);
if (sample < (1.0 / A)) {
sample = (float) ((A * sample) / (1.0 + LN_A));
} else {
sample = (float) ((1.0 + log(A * sample)) / (1.0 + LN_A));
}
sample *= sign;
long temp = (long) (sample * fullScale(8));
if (temp < 0) {
temp = -temp ^ 0x80L;
}
return temp ^ 0x55L;
}
}
This is how you get the actual sample data from the currently playing sound. The other excellent answer will tell you what the data means. Haven't tried it on another OS than my Windows 10 machine YMMV. For me it pulls the current system default recording device. On Windows set it to "Stereo Mix" instead of "Microphone" to get playing sound. You may have to toggle "Show Disabled Devices" to see "Stereo Mix".
import javax.sound.sampled.*;
public class SampleAudio {
private static long extendSign(long temp, int bitsPerSample) {
int extensionBits = 64 - bitsPerSample;
return (temp << extensionBits) >> extensionBits;
}
public static void main(String[] args) throws LineUnavailableException {
float sampleRate = 8000;
int sampleSizeBits = 16;
int numChannels = 1; // Mono
AudioFormat format = new AudioFormat(sampleRate, sampleSizeBits, numChannels, true, true);
TargetDataLine tdl = AudioSystem.getTargetDataLine(format);
tdl.open(format);
tdl.start();
if (!tdl.isOpen()) {
System.exit(1);
}
byte[] data = new byte[(int)sampleRate*10];
int read = tdl.read(data, 0, (int)sampleRate*10);
if (read > 0) {
for (int i = 0; i < read-1; i = i + 2) {
long val = ((data[i] & 0xffL) << 8L) | (data[i + 1] & 0xffL);
long valf = extendSign(val, 16);
System.out.println(i + "\t" + valf);
}
}
tdl.close();
}
}