I would like to convert arbitrary length of integers that are represented in binary format to the ASCII form.
One example being for the integer number 33023, the hexadecimal bytes is 0x80ff. I would like to represent 0x80ff into ASCII format of 33023 which has a hexadecimal representation of 0x3333303233.
I am working on a Java Card environment which does not recognize the String type so I would have to do the conversion manually via binary manipulation.
What is the most efficient way to go about solving this as Java Card environment on a 16 bit smart card is very constraint.
This is more tricky than that you may think as it requires base conversion, and base conversion is executed over the entire number, using big integer arithmetic.
That of course doesn't mean that we cannot create an efficient implementation of said big integer arithmetic specifically for the purpose. Here is an implementation that left pads with zero's (which is usually required on Java Card) and uses no additional memory (!). You may have to copy the original value of the big endian number if you want to keep it though - the input value is overwritten. Putting it in RAM is highly recommended.
This code simply divides the bytes with the new base (10 for decimals), returning the remainder. The remainder is the next lowest digit. As the input value has now been divided the next remainder is the digit that is just one position more significant than the one before. It keeps dividing and returning the remainder until the value is zero and the calculation is complete.
The tricky part of the algorithm is the inner loop, which divides the value by 10 in place, while returning the remainder using tail division over bytes. It provides one remainder / decimal digit per run. This also means that the order of the function is O(n) where n is the number of digits in the result (defining the tail division as a single operation). Note that n can be calculated by ceil(bigNumBytes * log_10(256)): the result of which is also present in the precalculated BCD_SIZE_PER_BYTES table. log_10(256) of course a constant decimal value, somewhere upwards of 2.408.
Here is the final code with optimizations (see the edit for different versions):
/**
* Converts an unsigned big endian value within the buffer to the same value
* stored using ASCII digits. The ASCII digits may be zero padded, depending
* on the value within the buffer.
* <p>
* <strong>Warning:</strong> this method zeros the value in the buffer that
* contains the original number. It is strongly recommended that the input
* value is in fast transient memory as it will be overwritten multiple
* times - until it is all zero.
* </p>
* <p>
* <strong>Warning:</strong> this method fails if not enough bytes are
* available in the output BCD buffer while destroying the input buffer.
* </p>
* <p>
* <strong>Warning:</strong> the big endian number can only occupy 16 bytes
* or less for this implementation.
* </p>
*
* #param uBigBuf
* the buffer containing the unsigned big endian number
* #param uBigOff
* the offset of the unsigned big endian number in the buffer
* #param uBigLen
* the length of the unsigned big endian number in the buffer
* #param decBuf
* the buffer that is to receive the BCD encoded number
* #param decOff
* the offset in the buffer to receive the BCD encoded number
* #return decLen, the length in the buffer of the received BCD encoded
* number
*/
public static short toDecimalASCII(byte[] uBigBuf, short uBigOff,
short uBigLen, byte[] decBuf, short decOff) {
// variables required to perform long division by 10 over bytes
// possible optimization: reuse remainder for dividend (yuk!)
short dividend, division, remainder;
// calculate stuff outside of loop
final short uBigEnd = (short) (uBigOff + uBigLen);
final short decDigits = BYTES_TO_DECIMAL_SIZE[uBigLen];
// --- basically perform division by 10 in a loop, storing the remainder
// traverse from right (least significant) to the left for the decimals
for (short decIndex = (short) (decOff + decDigits - 1); decIndex >= decOff; decIndex--) {
// --- the following code performs tail division by 10 over bytes
// clear remainder at the start of the division
remainder = 0;
// traverse from left (most significant) to the right for the input
for (short uBigIndex = uBigOff; uBigIndex < uBigEnd; uBigIndex++) {
// get rest of previous result times 256 (bytes are base 256)
// ... and add next positive byte value
// optimization: doing shift by 8 positions instead of mul.
dividend = (short) ((remainder << 8) + (uBigBuf[uBigIndex] & 0xFF));
// do the division
division = (short) (dividend / 10);
// optimization: perform the modular calculation using
// ... subtraction and multiplication
// ... instead of calculating the remainder directly
remainder = (short) (dividend - division * 10);
// store the result in place for the next iteration
uBigBuf[uBigIndex] = (byte) division;
}
// the remainder is what we were after
// add '0' value to create ASCII digits
decBuf[decIndex] = (byte) (remainder + '0');
}
return decDigits;
}
/*
* pre-calculated array storing the number of decimal digits for big endian
* encoded number with len bytes: ceil(len * log_10(256))
*/
private static final byte[] BYTES_TO_DECIMAL_SIZE = { 0, 3, 5, 8, 10, 13,
15, 17, 20, 22, 25, 27, 29, 32, 34, 37, 39 };
To extend the input size simply calculate and store the next decimal sizes in the table...
Related
When to calculate key's hashcode, spread() method is called:
static final int spread(int h) {
return (h ^ (h >>> 16)) & HASH_BITS;
}
where HASH_BITS equals 0x7fffffff, so, what is the purpose of HASH_BITS? Some one says it make the sign bit to 0, I am not sure about that.
The index of KV Node in hash buckets is calculated by following formula:
index = (n - 1) & hash
hash is the result of spread()
n is the length of hash buckets which maximum is 2^30
private static final int MAXIMUM_CAPACITY = 1 << 30;
So the maximum of n - 1 is 2^30 - 1 which means the top bit of hash will never be used in index calculation.
But i still don't understand is it necessary to clear the top bit of hash to 0.It seems that there are more reasons to do so.
/**
* Spreads (XORs) higher bits of hash to lower and also forces top
* bit to 0. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int spread(int h) {
return (h ^ (h >>> 16)) & HASH_BITS;
}
I think it is to avoid collision with the preserved hashcodes: MOVED(-1), TREEBIN(-2) and RESERVED(-3) of which symbol bits are always 1.
i want to find n bits most significant bit of a BigInteger and return it as a byte type.
it's my home work and i know very little thinks about it.
please help me to resolve it.its very necessary for me.
this is the method that must be implemented:
/**
* *
* Gets N numbers of bits from the MOST SIGNIFICANT BIT (inclusive).
*
* #param value Source from bits will be extracted
* #param n The number of bits taken
* #return The n most significant bits from value
*/
private byte msb(BigInteger value, int n) {
return 0x000;
}
You can try bitLength() in java.math.BigInteger which returns the number of bits in the number. You can use this method to retrieve the n most significant bits as:
int n = 3;
BigInteger r = BigInteger.valueOf(23);
BigInteger f = r.shiftRight(r.bitLength() - n);
Byte result = Byte.valueOf(f.toString());
System.out.println(result);
This prints 5 as expected.
I was just trying to convert the following methods that I wrote in C/C++ to Java. In short, the code provides a very efficient way of calculating the indices of the left-most and right-most bits of a number that are set to one. The two methods are based off of code in Knuth's Art of Computer programming, volume 4.
// Returns index of the left-most bit of x that is one in the binary
// expansion of x. Assumes x > 0 since otherwise lambda(x) is undefined.
// Can be used to calculate floor(log(x, 2)), the number of binary digits
// of x, minus one.
int lambda(unsigned long x) {
double y = (double) x;
// Excuse the monstrocity below. I need to have a long that has the raw
// bits of x in data. Simply (long)y would yield x back since C would cast
// the double to a long. So we need to cast it to a (void *) so that C
// "forgets" what kind of data we are dealing with, and then cast it to
// long.
unsigned long xx = *((long *)((void*)&y));
// The first 52 bits are the the significant. The rest are the sign and
// exponent. Since the number is assumed to be positive, we don't have to
// worry about the sign bit being 1 and can simply extract the exponent by
// shifting right 52 bits. The exponent is in "excess-1023" format so we
// must subtract 1023 after.
return (int)(xx >> 52) - 1023;
}
// Returns the index of the right-most one bit in the binary expansion of x
int rho(unsigned long x) {
return lambda(x & -x);
}
As you can see, I need to have a long that has the same bits of a double, but without a void* cast, I am not sure how to do this in Java. Any thoughts? Is it even possible?
There's a static function, doubleToLongBits(), to perform the type conversion.
long xx = Double.doubleToLongBits(y);
return (int) (xx >>> 52) - 1023;
Note the >>> treats the long as an unsigned value when shifting right.
Reading the commentary, though, it sounds like what you want is a simple function of the number of leading zeros.
return 63 - Long.numberOfLeadingZeros(x);
I would guess this is more efficient on most current architectures, but you'd have to profile it to be sure. There's a similar "trailing zeros" method to compute your rho() function.
I see an LCG implementation in Java under Random class as shown below:
/*
* This is a linear congruential pseudorandom number generator, as
* defined by D. H. Lehmer and described by Donald E. Knuth in
* <i>The Art of Computer Programming,</i> Volume 3:
* <i>Seminumerical Algorithms</i>, section 3.2.1.
*
* #param bits random bits
* #return the next pseudorandom value from this random number
* generator's sequence
* #since 1.1
*/
protected int next(int bits) {
long oldseed, nextseed;
AtomicLong seed = this.seed;
do {
oldseed = seed.get();
nextseed = (oldseed * multiplier + addend) & mask;
} while (!seed.compareAndSet(oldseed, nextseed));
return (int)(nextseed >>> (48 - bits));
}
But below link tells that LCG should be of the form, x2=(ax1+b)modM
https://math.stackexchange.com/questions/89185/what-does-linear-congruential-mean
But above code does not look in similar form. Instead it uses & in place of modulo operation as per below line
nextseed = (oldseed * multiplier + addend) & mask;
Can somebody help me understand this approach of using & instead of modulo operation?
Bitwise-ANDing with a mask which is of the form 2^n - 1 is the same as computing the number modulo 2^n: Any 1's higher up in the number are multiples of 2^n and so can be safely discarded. Note, however, that some multiplier/addend combinations work very poorly if you make the modulus a power of two (rather than a power of two minus one). That code is fine, but make sure it's appropriate for your constants.
This can be used if mask + 1 is a power of 2.
For instance, if you want to do modulo 4, you can write x & 3 instead of x % 4 to obtain the same result.
Note however that this requires that x be a positive number.
To convert an int into a byte array, I'm using the following code:
int a = 128;
byte[] b = convertIntValueToByteArray(a);
private static byte[] convertIntValueToByteArray(int intValue){
BigInteger bigInteger = BigInteger.valueOf(intValue);
byte[] origByteArray = bigInteger.toByteArray();
byte[] noSignByteArray = new byte[bigInteger.bitLength()/8];
if(bigInteger.bitLength()%8!=0){
noSignByteArray = origByteArray;
}else{
System.arraycopy(origByteArray,1,noSignByteArray,0,noSignByteArray.length);
}
return noSignByteArray;
}
There are two things which I'm attempting to do.
1)I need to know the number of bytes (rounded up to the closes byte) of the original integer. However, I don't need the additional bit that is added for the sign bit when I call the toByteArray() method. This is the reason why I have the helper method. So in this example, if I don't have the helper method, when I convert 128 to a byte array I get the length to be 2 octets because of the sign bit but I'm only expecting it to be one octet.
2)I need the positive representation of the number. In this example, if I attempt to print the first element in array b, I get -128. However, the numbers I will be using will be positive numbers only so what I actually want is 128. I'm limited to using a byte array. Is there a way to accomplish this?
Updated Post
Thank you for the responses. I haven't found the exact answer I was looking for so I'll attempt to give more details. Ultimately, I want to write values of different types over a data output stream. In this post, I'd like to clarify what happens when ints are written to a data output stream. I've come across two scenarios.
1)
DataOutputStream os = new DataOutputStream(this.socket.getOutputStream());
byte[] b = BigInteger.valueOf(128).toByteArray();
os.write(b);
2)
DataOutputStream os = new DataOutputStream(this.socket.getOutputStream());
os.write(128);
In the first scenario, when the bytes are read from a data input stream, it seems that the first element in the byte array is a 0 to represent the msb and the second element in the array contains the number -128. However, since the msb is 0 we would be able to determine that it is intended to be a positive number. In the second scenario, there is no msb and the only element present in the byte array read from the input stream is -128. I was expecting the write() method of the data output stream to convert the int into the byte array in the same manner as the toByteArray() method does on a BigInteger object. However, this doesn't seem to be the case as the msb is not present. So my question is, how in the second scenario are we supposed to know that 128 is supposed to be a positive number and not a negative one if there is no msb.
As you probably already know
In an octet, the pattern 10000000 can be interpreted as either 128 or -128, depending on the, um, outside interpretation
Java's byte type interprets octects as values in -128...127 only.
If you are building an application in which the entire world consists of nonnegative integers only, then you could simply do all of your work under the assumption that the byte value -128 will mean 128 and -127 will mean 129 and ... and -1 will mean 255. This is certainly doable but it takes work.
Dealing with the notion of an "unsigned byte" like this is normally done by expanding the byte into a short or int with the higher order bits all set to zero and then performing arithmetic or displaying your values. You will need to decide whether such an approach is more to your liking than just representing 128 as two octets in your array.
I think the following code might be sufficient.
In java int is a twos-complements binary number:
-1 = 111...111
ones complement = 000...000; + 1 =
1 = 000...001
So that about the sign bit I do not understand. Be it, that you could do Math.abs(n).
A byte ranges from -128 to 127, but the interpretation is a matter of masking, as below.
public static void main(String[] args) {
int n = 128;
byte[] bytes = intToFlexBytes(n);
for (byte b: bytes)
System.out.println("byte " + (((int)b) & 0xFF));
}
public static byte[] intToFlexBytes(int n) {
// Convert int to byte[4], via a ByteBuffer:
byte[] bytes = new byte[4];
ByteBuffer bb = ByteBuffer.allocateDirect(4);
bb.asIntBuffer().put(n);
bb.position(0);
bb.get(bytes);
// Leading bytes with 0:
int i = 0;
while (i < 4 && bytes[i] == 0)
++i;
// Shorten bytes array if needed:
if (i != 0) {
byte[] shortenedBytes = new byte[4 - i];
for (int j = i; j < 4; ++j) {
shortenedBytes[j - i] = bytes[j]; // System.arrayCopy not needed.
}
bytes = shortenedBytes;
}
return bytes;
}
To answer your first question—how many bytes are required to represent a nonnegative integer using an unsigned representation—consider the following functions I wrote in Common Lisp.
(defconstant +bits-per-byte+ 8)
(defun bit-length (n)
(check-type n (integer 0) "a nonnegative integer")
(if (zerop n)
1
(1+ (floor (log n 2)))))
(defun bytes-for-bits (n)
(check-type n (integer 1) "a positive integer")
(values (ceiling n +bits-per-byte+)))
These highlight the mathematical underpinnings of the problem: namely, the logarithm tells you how many powers of two (as provided by bits) it takes to dominate a given nonnegative integer, adjusted to be a step function with floor, and the number of bytes it takes to hold that number of bits again as a step function, this time adjusted with ceiling.
Note that the number zero is intolerable as input to a logarithm function, so we avoid it explicitly. You may observe that the bit-length function could also be written with a slight transformation of the core expression:
(defun bit-length-alt (n)
(check-type n (integer 0) "a nonnegative integer")
(values (ceiling (log (1+ n) 2))))
Unfortunately, as the logarithm of one is always zero, regardless of the base, this version says that the integer zero can be represented by zero bits, which isn't the answer we want.
For your second goal, you can use the functions I've defined above to allocate the required number of bytes, and incrementally set the bits you need, ignoring sign. It's hard to tell if you're having trouble getting the proper bits set in the byte vector, or whether your problem is in interpreting the bits in way that avoids treating the high bit as a sign bit (that is, two's complement representation). Please elaborate what kind of push you need to get you moving again.