I found a nice Java implementation of bit twiddling techniques here, for which I think many are based on that document. For in particular the toBitSet and nextPermutation, I wonder: Is it possible to make them support datatypes that can go beyond 64 bits (long)? And how?
Would it be possible to define such methods for Java's BigInt for example?
Or would iterating over the k-bit binary strings (of length n) (where k = #bits set to 1, n = #bits in total), as nextPermutation is doing, necessitate a (slower) String implementation (i.e. represent the binary numbers as n-character strings)?
Below is the source of the aforementioned operations. I'm sorry for not being able to add more information, I regret to admit that I have little knowledge of bitwise operations in general.
Any suggestions would be much appreciated.
toBitSet:
/**
* Converts {#code value} into a {#link BitSet} of size {#code size}.
*/
public static final BitSet toBitSet(final int size, final long value) {
BitSet bits = new BitSet(size);
int idx = 0;
long tmp = value;
while (tmp != 0L) {
if (tmp % 2L != 0L) {
bits.set(idx);
}
++idx;
tmp = tmp >>> 1;
}
return bits;
}
nextPermutation:
/**
* Compute the lexicographically next bit permutation.
*
* Suppose we have a pattern of N bits set to 1 in an integer and we want the next permutation of
* N 1 bits in a lexicographical sense. For example, if N is 3 and the bit pattern is 00010011,
* the next patterns would be 00010101, 00010110, 00011001,00011010, 00011100, 00100011, and so
* forth.
*/
public static final long nextPermutation(long val) {
long tmp = val | (val - 1);
return (tmp + 1) | (((-tmp & -~tmp) - 1) >> (Long.numberOfTrailingZeros(val) + 1));
}
I would like to convert arbitrary length of integers that are represented in binary format to the ASCII form.
One example being for the integer number 33023, the hexadecimal bytes is 0x80ff. I would like to represent 0x80ff into ASCII format of 33023 which has a hexadecimal representation of 0x3333303233.
I am working on a Java Card environment which does not recognize the String type so I would have to do the conversion manually via binary manipulation.
What is the most efficient way to go about solving this as Java Card environment on a 16 bit smart card is very constraint.
This is more tricky than that you may think as it requires base conversion, and base conversion is executed over the entire number, using big integer arithmetic.
That of course doesn't mean that we cannot create an efficient implementation of said big integer arithmetic specifically for the purpose. Here is an implementation that left pads with zero's (which is usually required on Java Card) and uses no additional memory (!). You may have to copy the original value of the big endian number if you want to keep it though - the input value is overwritten. Putting it in RAM is highly recommended.
This code simply divides the bytes with the new base (10 for decimals), returning the remainder. The remainder is the next lowest digit. As the input value has now been divided the next remainder is the digit that is just one position more significant than the one before. It keeps dividing and returning the remainder until the value is zero and the calculation is complete.
The tricky part of the algorithm is the inner loop, which divides the value by 10 in place, while returning the remainder using tail division over bytes. It provides one remainder / decimal digit per run. This also means that the order of the function is O(n) where n is the number of digits in the result (defining the tail division as a single operation). Note that n can be calculated by ceil(bigNumBytes * log_10(256)): the result of which is also present in the precalculated BCD_SIZE_PER_BYTES table. log_10(256) of course a constant decimal value, somewhere upwards of 2.408.
Here is the final code with optimizations (see the edit for different versions):
/**
* Converts an unsigned big endian value within the buffer to the same value
* stored using ASCII digits. The ASCII digits may be zero padded, depending
* on the value within the buffer.
* <p>
* <strong>Warning:</strong> this method zeros the value in the buffer that
* contains the original number. It is strongly recommended that the input
* value is in fast transient memory as it will be overwritten multiple
* times - until it is all zero.
* </p>
* <p>
* <strong>Warning:</strong> this method fails if not enough bytes are
* available in the output BCD buffer while destroying the input buffer.
* </p>
* <p>
* <strong>Warning:</strong> the big endian number can only occupy 16 bytes
* or less for this implementation.
* </p>
*
* #param uBigBuf
* the buffer containing the unsigned big endian number
* #param uBigOff
* the offset of the unsigned big endian number in the buffer
* #param uBigLen
* the length of the unsigned big endian number in the buffer
* #param decBuf
* the buffer that is to receive the BCD encoded number
* #param decOff
* the offset in the buffer to receive the BCD encoded number
* #return decLen, the length in the buffer of the received BCD encoded
* number
*/
public static short toDecimalASCII(byte[] uBigBuf, short uBigOff,
short uBigLen, byte[] decBuf, short decOff) {
// variables required to perform long division by 10 over bytes
// possible optimization: reuse remainder for dividend (yuk!)
short dividend, division, remainder;
// calculate stuff outside of loop
final short uBigEnd = (short) (uBigOff + uBigLen);
final short decDigits = BYTES_TO_DECIMAL_SIZE[uBigLen];
// --- basically perform division by 10 in a loop, storing the remainder
// traverse from right (least significant) to the left for the decimals
for (short decIndex = (short) (decOff + decDigits - 1); decIndex >= decOff; decIndex--) {
// --- the following code performs tail division by 10 over bytes
// clear remainder at the start of the division
remainder = 0;
// traverse from left (most significant) to the right for the input
for (short uBigIndex = uBigOff; uBigIndex < uBigEnd; uBigIndex++) {
// get rest of previous result times 256 (bytes are base 256)
// ... and add next positive byte value
// optimization: doing shift by 8 positions instead of mul.
dividend = (short) ((remainder << 8) + (uBigBuf[uBigIndex] & 0xFF));
// do the division
division = (short) (dividend / 10);
// optimization: perform the modular calculation using
// ... subtraction and multiplication
// ... instead of calculating the remainder directly
remainder = (short) (dividend - division * 10);
// store the result in place for the next iteration
uBigBuf[uBigIndex] = (byte) division;
}
// the remainder is what we were after
// add '0' value to create ASCII digits
decBuf[decIndex] = (byte) (remainder + '0');
}
return decDigits;
}
/*
* pre-calculated array storing the number of decimal digits for big endian
* encoded number with len bytes: ceil(len * log_10(256))
*/
private static final byte[] BYTES_TO_DECIMAL_SIZE = { 0, 3, 5, 8, 10, 13,
15, 17, 20, 22, 25, 27, 29, 32, 34, 37, 39 };
To extend the input size simply calculate and store the next decimal sizes in the table...
I'm trying to make a program that converts values to bits. Everything worked well till I got to GB(gigabytes). So for instance 1 GB should equal 8 billion bits but the result is giving me a negative answer. Here is my code can someone give me some insight?
else if(line.equals("GB")) {
Scanner num = new Scanner(System.in);
System.out.println("How many GigaBytes are you transfering to bits?");
int number = num.nextInt();
//int ans = number * 8 * 1000000000;
BigInteger bigAns = BigInteger.valueOf(number * 8 * 1000000000);
System.out.println(number + " GigaByte(s) equals " + bigAns + " bits.");
}
Here is the output I'm getting: 1 GigaByte(s) equals -589934592 bits.
It wouldn't be a bad thing to use BigInteger throughout your calculations. This way, you don't run the risk of overflow while multiplying these numbers.
BigInteger bigAns = BigInteger.valueOf(number).multiply(BigInteger.valueOf(8))
.multiply(BigInteger.valueOf(1000000000L));
You are getting a negative number because you are exceeding the maximum possible value for a signed 32bit integer and causing an overflow.
When dealing with large numbers like this, you should use long instead, which is capable of holding much larger numbers.
To implement this, change int ans = number * 8 * 1000000000 tolong ans = number * 8 * 1000000000l
The answer you get is a garbage value.
You can convert int to BigInteger by doing so:
BigInteger bi = BigInteger.valueOf(myInteger.intValue());
And as Bohsulav said:
You can use this number * 8 * 1000000000l to prevent the overflow.
Helped? Let me know :)
First convert to biginteger then perform the computations.
BigInteger.valueOf(number * 8 * 1000000000);
Here, you perform the computation in int, then convert to BigInteger afterwards, when it is too late.
Use BigInteger.valueOf(number), then call appropriate methods to perform your computation.
I have been thinking of it but have ran out of idea's. I have 10 arrays each of length 18 and having 18 double values in them. These 18 values are features of an image. Now I have to apply k-means clustering on them.
For implementing k-means clustering I need a unique computational value for each array. Are there any mathematical or statistical or any logic that would help me to create a computational value for each array, which is unique to it based upon values inside it. Thanks in advance.
Here is my array example. Have 10 more
[0.07518284315321135
0.002987851573676068
0.002963866526639678
0.002526139418225552
0.07444872939213325
0.0037219653347541617
0.0036979802877177715
0.0017920256571474585
0.07499695903867931
0.003477831820276616
0.003477831820276616
0.002036159171625004
0.07383539747505984
0.004311312204791184
0.0043352972518275745
0.0011786937400740452
0.07353130134299131
0.004339580295941216]
Did you checked the Arrays.hashcode in Java 7 ?
/**
* Returns a hash code based on the contents of the specified array.
* For any two <tt>double</tt> arrays <tt>a</tt> and <tt>b</tt>
* such that <tt>Arrays.equals(a, b)</tt>, it is also the case that
* <tt>Arrays.hashCode(a) == Arrays.hashCode(b)</tt>.
*
* <p>The value returned by this method is the same value that would be
* obtained by invoking the {#link List#hashCode() <tt>hashCode</tt>}
* method on a {#link List} containing a sequence of {#link Double}
* instances representing the elements of <tt>a</tt> in the same order.
* If <tt>a</tt> is <tt>null</tt>, this method returns 0.
*
* #param a the array whose hash value to compute
* #return a content-based hash code for <tt>a</tt>
* #since 1.5
*/
public static int hashCode(double a[]) {
if (a == null)
return 0;
int result = 1;
for (double element : a) {
long bits = Double.doubleToLongBits(element);
result = 31 * result + (int)(bits ^ (bits >>> 32));
}
return result;
}
I dont understand why #Marco13 mentioned " this is not returning unquie for arrays".
UPDATE
See #Macro13 comment for the reason why it cannot be unquie..
UPDATE
If we draw a graph using your input points, ( 18 elements) has one spike and 3 low values and the pattern goes..
if that is true.. you can find the mean of your Peak ( 1, 4, 8,12,16 ) and find the low Mean from remaining values.
So that you will be having Peak mean and Low mean . and you find the unquie number to represent these two also preserve the values using bijective algorithm described in here
This Alogirthm also provides formulas to reverse i.e take the Peak and Low mean from the unquie value.
To find unique pair < x; y >= x + (y + ( (( x +1 ) /2) * (( x +1 ) /2) ) )
Also refer Exercise 1 in pdf page 2 to reverse x and y.
For finding Mean and find paring value.
public static double mean(double[] array){
double peakMean = 0;
double lowMean = 0;
for (int i = 0; i < array.length; i++) {
if ( (i+1) % 4 == 0 || i == 0){
peakMean = peakMean + array[i];
}else{
lowMean = lowMean + array[i];
}
}
peakMean = peakMean / 5;
lowMean = lowMean / 13;
return bijective(lowMean, peakMean);
}
public static double bijective(double x,double y){
double tmp = ( y + ((x+1)/2));
return x + ( tmp * tmp);
}
for test
public static void main(String[] args) {
double[] arrays = {0.07518284315321135,0.002963866526639678,0.002526139418225552,0.07444872939213325,0.0037219653347541617,0.0036979802877177715,0.0017920256571474585,0.07499695903867931,0.003477831820276616,0.003477831820276616,0.002036159171625004,0.07383539747505984,0.004311312204791184,0.0043352972518275745,0.0011786937400740452,0.07353130134299131,0.004339580295941216};
System.out.println(mean(arrays));
}
You can use this the peak and low values to find the similar images.
You can simply sum the values, using double precision, the result value will unique most of the times. On the other hand, if the value position is relevant, then you can apply a sum using the index as multiplier.
The code could be as simple as:
public static double sum(double[] values) {
double val = 0.0;
for (double d : values) {
val += d;
}
return val;
}
public static double hash_w_order(double[] values) {
double val = 0.0;
for (int i = 0; i < values.length; i++) {
val += values[i] * (i + 1);
}
return val;
}
public static void main(String[] args) {
double[] myvals =
{ 0.07518284315321135, 0.002987851573676068, 0.002963866526639678, 0.002526139418225552, 0.07444872939213325, 0.0037219653347541617, 0.0036979802877177715, 0.0017920256571474585, 0.07499695903867931, 0.003477831820276616,
0.003477831820276616, 0.002036159171625004, 0.07383539747505984, 0.004311312204791184, 0.0043352972518275745, 0.0011786937400740452, 0.07353130134299131, 0.004339580295941216 };
System.out.println("Computed value based on sum: " + sum(myvals));
System.out.println("Computed value based on values and its position: " + hash_w_order(myvals));
}
The output for that code, using your list of values is:
Computed value based on sum: 0.41284176550504803
Computed value based on values and its position: 3.7396448842464496
Well, here's a method that works for any number of doubles.
public BigInteger uniqueID(double[] array) {
final BigInteger twoToTheSixtyFour =
BigInteger.valueOf(Long.MAX_VALUE).add(BigInteger.ONE);
BigInteger count = BigInteger.ZERO;
for (double d : array) {
long bitRepresentation = Double.doubleToRawLongBits(d);
count = count.multiply(twoToTheSixtyFour);
count = count.add(BigInteger.valueOf(bitRepresentation));
}
return count;
}
Explanation
Each double is a 64-bit value, which means there are 2^64 different possible double values. Since a long is easier to work with for this sort of thing, and it's the same number of bits, we can get a 1-to-1 mapping from doubles to longs using Double.doubleToRawLongBits(double).
This is awesome, because now we can treat this like a simple combinations problem. You know how you know that 1234 is a unique number? There's no other number with the same value. This is because we can break it up by its digits like so:
1234 = 1 * 10^3 + 2 * 10^2 + 3 * 10^1 + 4 * 10^0
The powers of 10 would be "basis" elements of the base-10 numbering system, if you know linear algebra. In this way, base-10 numbers are like arrays consisting of only values from 0 to 9 inclusively.
If we want something similar for double arrays, we can discuss the base-(2^64) numbering system. Each double value would be a digit in a base-(2^64) representation of a value. If there are 18 digits, there are (2^64)^18 unique values for a double[] of length 18.
That number is gigantic, so we're going to need to represent it with a BigInteger data-structure instead of a primitive number. How big is that number?
(2^64)^18 = 61172327492847069472032393719205726809135813743440799050195397570919697796091958321786863938157971792315844506873509046544459008355036150650333616890210625686064472971480622053109783197015954399612052812141827922088117778074833698589048132156300022844899841969874763871624802603515651998113045708569927237462546233168834543264678118409417047146496
There are that many unique configurations of 18-length double arrays and this code lets you uniquely describe them.
I'm going to suggest three methods, with different pros and cons which I will outline.
Hash Code
This is the obvious "solution", though it has been correctly pointed out that it will not be unique. However, it will be very unlikely that any two arrays will have the same value.
Weighted Sum
Your elements appear to be bounded; perhaps they range from a minimum of 0 to a maximum of 1. If this is the case, you can multiply the first number by N^0, the second by N^1, the third by N^2 and so on, where N is some large number (ideally the inverse of your precision). This is easily implemented, particularly if you use a matrix package, and very fast. We can make this unique if we choose.
Euclidean Distance from Mean
Subtract the mean of your arrays from each array, square the results, sum the squares. If you have an expected mean, you can use that. Again, not unique, there will be collisions, but you (almost) can't avoid that.
The difficulty of uniqueness
It has already been explained that hashing will not give you a unique solution. A unique number is possible in theory, using the Weighted Sum, but we have to use numbers of a very large size. Let's say your numbers are 64 bits in memory. That means that there are 2^64 possible numbers they can represent (slightly less using floating point). Eighteen such numbers in an array could represent 2^(64*18) different numbers. That's huge. If you use anything less, you will not be able to guarantee uniqueness due to the pigeonhole principle.
Let's look at a trivial example. If you have four letters, a, b, c and d, and you have to number them each uniquely using the numbers 1 to 3, you can't. That's the pigeonhole principle. You have 2^(18*64) possible numbers. You can't number them uniquely with less than 2^(18*64) numbers, and hashing doesn't give you that.
If you use BigDecimal, you can represent (almost) arbitrarily large numbers. If the largest element you can get is 1 and the smallest 0, then you can set N = 1/(precision) and apply the Weighted Sum mentioned above. This will guarantee uniqueness. The precision for doubles in Java is Double.MIN_VALUE. Note that the array of weights needs to be stored in _Big Decimal_s!
That satisfies this part of your question:
create a computational value for each array, which is unique to it
based upon values inside it
However, there is a problem:
1 and 2 suck for K Means
I am assuming from your discussion with Marco 13 that you are performing the clustering on the single values, not the length 18 arrays. As Marco has already mentioned, Hashing sucks for K means. The whole idea is that the smallest change in the data will result in a large change in Hash Values. That means that two images which are similar, produce two very similar arrays, produce two very different "unique" numbers. Similarity is not preserved. The result will be pseudo random!!!
Weighted Sums are better, but still bad. It will basically ignore all the elements except for the last one, unless the last element is the same. Only then will it look at the next to last, and so on. Similarity is not really preserved.
Euclidean distance from the mean (or at least some point) will at least group things together in a sort of sensible way. Direction will be ignored, but at least things that are far from the mean won't be grouped with things that are close. Similarity of one feature is preserved, the other features are lost.
In summary
1 is very easy, but is not unique and doesn't preserve similarity.
2 is easy, can be unique and doesn't preserve similarity.
3 is easy, but is not unique and preserves some similarity.
Implementatio of Weighted Sum. Not really tested.
public class Array2UniqueID {
private final double min;
private final double max;
private final double prec;
private final int length;
/**
* Used to provide a {#code BigInteger} that is unique to the given array.
* <p>
* This uses weighted sum to guarantee that two IDs match if and only if
* every element of the array also matches. Similarity is not preserved.
*
* #param min smallest value an array element can possibly take
* #param max largest value an array element can possibly take
* #param prec smallest difference possible between two array elements
* #param length length of each array
*/
public Array2UniqueID(double min, double max, double prec, int length) {
this.min = min;
this.max = max;
this.prec = prec;
this.length = length;
}
/**
* A convenience constructor which assumes the array consists of doubles of
* full range.
* <p>
* This will result in very large IDs being returned.
*
* #see Array2UniqueID#Array2UniqueID(double, double, double, int)
* #param length
*/
public Array2UniqueID(int length) {
this(-Double.MAX_VALUE, Double.MAX_VALUE, Double.MIN_VALUE, length);
}
public BigDecimal createUniqueID(double[] array) {
// Validate the data
if (array.length != length) {
throw new IllegalArgumentException("Array length must be "
+ length + " but was " + array.length);
}
for (double d : array) {
if (d < min || d > max) {
throw new IllegalArgumentException("Each element of the array"
+ " must be in the range [" + min + ", " + max + "]");
}
}
double range = max - min;
/* maxNums is the maximum number of numbers that could possibly exist
* between max and min.
* The ID will be in the range 0 to maxNums^length.
* maxNums = range / prec + 1
* Stored as a BigDecimal for convenience, but is an integer
*/
BigDecimal maxNums = BigDecimal.valueOf(range)
.divide(BigDecimal.valueOf(prec))
.add(BigDecimal.ONE);
// For convenience
BigDecimal id = BigDecimal.valueOf(0);
// 2^[ (el-1)*length + i ]
for (int i = 0; i < array.length; i++) {
BigDecimal num = BigDecimal.valueOf(array[i])
.divide(BigDecimal.valueOf(prec))
.multiply(maxNums).pow(i);
id = id.add(num);
}
return id;
}
As I understand, you are going to make k-clustering, based on the double values.
Why not just wrap double value in an object, with array and position identifier, so you would know in which cluster it ended up?
Something like:
public class Element {
final public double value;
final public int array;
final public int position;
public Element(double value, int array, int position) {
this.value = value;
this.array = array;
this.position = position;
}
}
If you need to cluster array as a whole,
You can transform original arrays of length 18 to array of length 19 with last or first element being unique id, that you will ignore during clustering, but, to which you could refer after clustering finished. That way this have a small memory footprint - of 8 additional bytes for an array, and easy association with the original value.
If space is absolutely a problem, and you have all values of an array lesser than 1, you can add unique id, greater or equal to 1 to each array, and cluster, based on reminder of division to 1, 0.07518284315321135 stays 0.07518284315321135 for the 1st, and 0.07518284315321135 becomes 1.07518284315321135 for the 2nd, although this increases complexity of computation during clustering.
First of all, let's try to understand what you need mathematically:
Uniquely mapping an array of m real numbers to a single number is in fact a bijection between R^m and R, or at least N.
Since floating points are in fact rational numbers, your problem is to find a bijection between Q^m and N, which can be transformed to N^n to N, because you know your values will always be greater than 0 (just multiply your values by the precision).
Thus you need to map N^m to N. Take a look at the Cantor Pairing Function for some ideas
A guaranteed way to generate a unique result based on the array is to convert it to one big string, and use that for your computational value.
It may be slow, but it will be unique based on the array's values.
Implementation examples:
Best way to convert an ArrayList to a string
I see an LCG implementation in Java under Random class as shown below:
/*
* This is a linear congruential pseudorandom number generator, as
* defined by D. H. Lehmer and described by Donald E. Knuth in
* <i>The Art of Computer Programming,</i> Volume 3:
* <i>Seminumerical Algorithms</i>, section 3.2.1.
*
* #param bits random bits
* #return the next pseudorandom value from this random number
* generator's sequence
* #since 1.1
*/
protected int next(int bits) {
long oldseed, nextseed;
AtomicLong seed = this.seed;
do {
oldseed = seed.get();
nextseed = (oldseed * multiplier + addend) & mask;
} while (!seed.compareAndSet(oldseed, nextseed));
return (int)(nextseed >>> (48 - bits));
}
But below link tells that LCG should be of the form, x2=(ax1+b)modM
https://math.stackexchange.com/questions/89185/what-does-linear-congruential-mean
But above code does not look in similar form. Instead it uses & in place of modulo operation as per below line
nextseed = (oldseed * multiplier + addend) & mask;
Can somebody help me understand this approach of using & instead of modulo operation?
Bitwise-ANDing with a mask which is of the form 2^n - 1 is the same as computing the number modulo 2^n: Any 1's higher up in the number are multiples of 2^n and so can be safely discarded. Note, however, that some multiplier/addend combinations work very poorly if you make the modulus a power of two (rather than a power of two minus one). That code is fine, but make sure it's appropriate for your constants.
This can be used if mask + 1 is a power of 2.
For instance, if you want to do modulo 4, you can write x & 3 instead of x % 4 to obtain the same result.
Note however that this requires that x be a positive number.