calculating the byte usage of a plain number instance - java

I am writing a custom file format where I want to parse a certain region of the file into a map which takes strings as keys and java.lang.number instances (e.g. int, long) as values. Is there any way to access the byte usage of the values without having to determine their class type first? I mean all subclasses of Number have a static variable called "BYTES" referencing the actual number of bytes they use.
I want to do something like this:
byte b = ((Number)map.get("key")).BYTES;

You can get the value of the BYTES constant using reflection:
Number number = map.get("key");
int numberOfBytes = number.getClass().getDeclaredField("BYTES").getInt(number);

One way to do it is long hand
byte b = 0;
Number n = (Number)map.get("key");
if (n instanceof Integer) {
b = 4;
}
else if (n instanceof Byte) {
b = 1;
}
else if ...
David SN posted a solution using reflection which may work out better for you

Related

ByteArray to DoubleArray in Kotlin

I want to extrapolate a byte array into a double array.
I know how to do it in Java. But the AS converter doesn't work for this... :-D
This is the class I want to write in Kotlin:
class ByteArrayToDoubleArrayConverter {
public double[] invoke(byte[] bytes) {
double[] doubles = new double[bytes.length / 2];
int i = 0;
for (int n = 0; n < bytes.length; n = n + 2) {
doubles[i] = (bytes[n] & 0xFF) | (bytes[n + 1] << 8);
i = i + 1;
}
return doubles;
}
}
This would be a typical example of what results are expected:
class ByteArrayToDoubleArrayConverterTest {
#Test
fun `check typical values`() {
val bufferSize = 8
val bytes = ByteArray(bufferSize)
bytes[0] = 1
bytes[1] = 0
bytes[2] = 0
bytes[3] = 1
bytes[4] = 0
bytes[5] = 2
bytes[6] = 1
bytes[7] = 1
val doubles = ByteArrayToDoubleArrayConverter().invoke(bytes)
assertTrue(1.0 == doubles[0])
assertTrue(256.0 == doubles[1])
assertTrue(512.0 == doubles[2])
assertTrue(257.0 == doubles[3])
}
}
Any idea? Thanks!!!
I think this would be clearest with a helper function.  Here's an extension function that uses a lambda to convert pairs of bytes into a DoubleArray:
inline fun ByteArray.mapPairsToDoubles(block: (Byte, Byte) -> Double)
= DoubleArray(size / 2){ i -> block(this[2 * i], this[2 * i + 1]) }
That uses the DoubleArray constructor which takes an initialisation lambda as well as a size, so you don't need to loop through setting values after construction.
The required function then simply needs to know how to convert each pair of bytes into a double.  Though it would be more idiomatic as an extension function rather than a class:
fun ByteArray.toDoubleSamples() = mapPairsToDoubles{ a, b ->
(a.toInt() and 0xFF or (b.toInt() shl 8)).toDouble()
}
You can then call it with e.g.:
bytes.toDoubleSamples()
(.toXxx() is the conventional name for a function which returns a transformed version of an object.  The standard name for this sort of function would be toDoubleArray(), but that normally converts each value to its own double; what you're doing is more specialised, so a more specialised name would avoid confusion.)
The only awkward thing there (and the reason why the direct conversion from Java fails) is that Kotlin is much more fussy about its numeric types, and won't automatically promote them the way Java and C do; it also doesn't have byte overloads for its bitwise operators.  So you need to call toInt() explicitly on each byte before you can call and and shl, and then call toDouble() on the result.
The result is code that is a lot shorter, hopefully much more readable, and also very efficient!  (No intermediate arrays or lists, and — thanks to the inline — not even any unnecessary function calls.)
(It's a bit more awkward than most Kotlin code, as primitive arrays aren't as well-supported as reference-based arrays — which are themselves not as well-supported as lists.  This is mainly for legacy reasons to do with Java compatibility.  But it's a shame that there's no chunked() implementation for ByteArray, which could have avoided the helper function, though at the cost of a temporary list.)

Convert a String to array of bits

I would like to convert a String consisting of 0's and 1's to an array of bits.
The String is of length ~30000 and is sparse (mostly 0s, few 1s)
For example, given a string
"00000000100000000010000100000000001000"
I would like to convert it to an array of bits which will store
[00000000100000000010000100000000001000]
I am thinking of using BitSet or OpenBitSet
Is there a better way? The use case is to perform logical OR efficiently.
I am thinking along these lines
final OpenBitSet logicalOrResult = new OpenBitSet();
for (final String line : lines) {
final OpenBitSet myBitArray = new OpenBitSet();
int pos = 0;
for (final char c : str.toCharArray()) {
myBitArray.set(pos) = c;
pos++;
}
logicalOrResult.or(myBitArray);
}
BigInteger can parse it and store it, and do bitwise operations:
BigInteger x = new BigInteger(bitString, 2);
BigInteger y = new BigInteger(otherBitString, 2);
x = x.or(y);
System.out.println(x.toString(2));
A BitSet ranging over values between 0 and 30000 requires a long array of size less than 500, so you can assume that BitSet.or (or the respective OpenBitSet method) will be sufficiently fast, despite the sparsity. It looks like OpenBitSet has better performance than BitSet, but apart from this it doesn't really matter which you use, both will implement or efficiently. However, be sure to pass the length of the String to the (Open)BitSet constructor to avoid reallocations of the internal long array during construction!
If your strings are much longer and your sparsity is extreme, you could also consider storing them as a sorted list of Integers (or ints, if you use a library like Trove), representing the indices which contain a 1. A bitwise or can be implemented in a merge(sort)-like fashion, which is quite efficient (time O(n + m), where n, m are the numbers of ones in each string). I suspect that in your scenario it will be slower than the BitSet approach though.
You can iterate through each character:
boolean[] bits = new boolean[str.length];
for (int i=0;i<str.length;i++) {
if (str.charAt(i).equals("1")
bits[i] = true;
else if (str.charAt(i).equals("0")
bits[i] = false;
}
If you want to be memory efficient, you could try RLE (Run Length Encoding).

why can't byte array be stored in integer array in java

This code is valid
int h;
byte r;
h=r;
but these are not
int[] h;
byte[] r;
h=r;
or say
int[] h =new byte[4];
I would like to know why?
There's an implicit conversion from byte to int, but not from byte[] to int[]. This makes a lot of sense - the JIT compiler knows that to get to a value in an int[], it just needs to multiply the index by 4 and add that to the start of the data (after validation, and assuming no extra padding, of course). That wouldn't work if you could assign a byte[] reference to an int[] variable - the representations are different.
The language could have been designed to allow that conversion but make it create a new int[] which contained a copy of all the bytes, but that would have been pretty surprising in terms of the design of the rest of Java, where the assignment operator just copies a value from the right hand side of the operator to the variable on the left.
Alternatively, we could have imposed a restriction on the VM that every array access would have to look at the actual type of the array object in question, and work out how to get to the element appropriately... but that would have been horrible (even worse than the current nastiness of reference-type array covariance).
That's the design. When you assign byte to wider int, that's okay. But when you declare new byte[4], that's a ["continuous"] part of memory which is, roughly speaking, equal to 4 * 8 bits (or 4 bytes). And one int is 32 bits, so, technically, all your byte array's size is equal to size of one int. In C, where you have a direct memory access, you could do some pointer magic and get your byte pointer casted to int pointer. In Java, you cant and that's safe.
Anyway, why do you want that?
Disclaimer: the code below is considered to be extremely unlikely seen anywhere except for the most critical sections in some performance-sensitive libraries/apps. Ideone: http://ideone.com/e14Omr
Comments are explanatory enough, I hope.
import sun.misc.Unsafe;
import java.lang.reflect.Field;
public class Main {
public static void main(String[] args) throws NoSuchFieldException, IllegalAccessException, InstantiationException {
/* too lazy to run with VM args, use Reflection */
Field f = Unsafe.class.getDeclaredField("theUnsafe");
f.setAccessible(true);
/* get array address */
Unsafe unsafe = (Unsafe)f.get(null);
byte four_bytes[] = {25, 25, 25, 25};
Object trash[] = new Object[] { four_bytes };
long base_offset_bytes = unsafe.arrayBaseOffset(Object[].class);
long four_bytes_address = unsafe.getLong(trash, base_offset_bytes); // <- this is it
long ints_addr = unsafe.allocateMemory(16); // allocate 4 * 4 bytes, i.e. 4 ints
unsafe.copyMemory(four_bytes_address + base_offset_bytes, ints_addr, 4); // copy all four bytes
for(int i = 0; i < 4; i++) {
System.out.println(unsafe.getInt(ints_addr + i)); //run through entire allocated int[],
// get some intestines
}
System.out.println("*****************************");
for(int i = 0; i < 16; i++) {
System.out.println(unsafe.getByte(ints_addr + i)); //run through entire allocated int[],
// get some intestines
}
}
}
The difference is firstly due to the difference in behavior between primitive types and reference types.
In case you're not familiar with it, primitive types have "value semantics". This means that when you do a = b; when a and b are a primitive type (byte, short, int, long, float, double, boolean, or char) the numeric/boolean value is copied. For example:
int a = 3;
int b = a; // int value of a is copied to b
a = 5;
System.out.println(b); // outputs: 3
But arrays are objects, and objects have "reference semantics". That means that when you do a = b; where a and b are both declared as an array type, the array object that is referred to becomes shared. In a sense the value is still copied, but here the "value" is just the pointer to the object located elsewhere in memory. For example:
int[] a = new int[] { 3 };
int[] b = a; // pointer value of a is copied to b, so a and b now point at the same array object
a[0] = 5;
System.out.println(b[0]); // outputs: 5
a = null; // note: 'a' now points at no array, although this has no effect on b
System.out.println(b[0]); // outputs: 5
So it is okay to do int = byte because the numeric value is going to be copied (as they are both primitive types) and also because any possible value of type byte can be safely stored in an int (it is a "widening" primitive conversion).
But int[] and byte[] are both object types, so when you do int[] = byte[] you are asking for the object (the array) to be shared (not copied).
Now you have to ask, why can't an int array and a byte array share their array memory? And what would if mean if they did?
Ints are 4 times the size of bytes, so if the int and byte arrays were to have the same number of elements, then this causes all sorts of nonsense. If you tried to implement it in a memory efficient way, then complex (and very slow) run-time logic would be needed when accessing elements of int arrays to see if they were actually byte arrays. Int reads from byte array memory would have to read and widen the byte value, and int stores would have to either lose the upper 3 bytes, or throw an exception saying that there isn't enough space. Or, you could do it in a fast but memory-wasting way, by padding all byte arrays so that there are 3 wasted bytes per element, just in case somebody wants to use the byte array as an int array.
On the other hand, perhaps you want to pack 4 bytes per int (in this case, the shared array won't have the same number of elements depending on the type of the variable you use to access it). Unfortunately this also causes nonsense. The biggest problem is that it is not portable across CPU architectures. On a little-endian PC, b[0] would refer to the low byte of i[0], but on an ARM device b[0] might point at the high byte of i[0] (and it could even change while the program is running as ARM has a switchable endianness). The overhead of accessing the array's length property would also be made more complicated, and just what should happen if the byte array's length is not divisible by 4?!
You can do this in C, but that's because C arrays don't have a well-defined length property and because C doesn't try to protect you from the other issues. C doesn't care if you go outside the array bounds or muddle up endianness. But Java does care, so it is not feasible to share the array memory in Java. (Java doesn't have unions.)
That's why int[].class and byte[].class both separately extend class Object, but neither class extends the other. You can't store a reference to a byte array in a variable that is declared to point at int arrays, in the same way you can't store a reference to a List in a variable of type String; they're just incompatible classes.
When you say
int[] arr = new byte[5];
you copy references. On the right hand side is a reference to a byte array. Essentially, this looks like:
|__|__|__|__|__|
0 1 2 3 4 offset of elements, in bytes
^
|
reference to byte array
On the left hand side is a reference to an int array. This, however, is expected to look thus:
|________|________|________|________|________|
0 4 8 12 16
^
|
reference to int array
Hence, simply copying the reference is not possible. For, to get arr[1], the code would look at the starting address+4 (rather than starting adress+1).
The only way to achieve what you want is to create an int[] that has the same number of elements and copy the bytes there.
The rationale behind not doing that automatically:
interpreting a single byte as an int comes at essentially no cost, especially no memory must be allocated.
copying a byte array is completly different. The new int array must be allocated, which is at least 4 times at big as the byte array. The copy process itself could take some time.
Conclusion: In Java, you can always say "I want to treat this special byte as if it were an int." But you can not say: "I want to treat some data structure (like an array, or a class instance) that contains bytes as if it contained ints."
Simply, Type byte[] does not extend int[]
you cant because its like big element is going to be stored in smaller one.Integer cant be stored in byte.Its our memory design who decides these type of allocation

Current best way to populate mixed type byte array

I'm trying to send and receive a byte stream in which certain ranges of bytes represent different pieces of data. I've found ways to convert single primitive datatypes into bytes, but I'm wondering if there's a straightforward way to place certain pieces of data into specified byte regions.
For example, I might need to produce or read something like the following:
byte 1 - int
byte 2-5 - int
byte 6-13 - double
byte 14-21 - double
byte 25 - int
byte 26-45 - string
Any suggestions would be appreciated.
Try DataOutputStream/DataInputStream or, for arrays, the ByteBuffer class.
For storing the integer in X bytes, you may use the following method. If you think it is badly named, you may use the much less descriptive i2os name which is used in several (crypto) algorithm descriptions. Note that the returned octet string uses Big Endian encoding of unsigned ints, which you should specify for your protocol.
public static byte[] possitiveIntegerToOctetString(
final long value, final int octets) {
if (value < 0) {
throw new IllegalArgumentException("Cannot encode negative values");
}
if (octets < 1) {
throw new IllegalArgumentException("Cannot encode a number in negative or zero octets");
}
final int longSizeBytes = Long.SIZE / Byte.SIZE;
final int byteBufferSize = Math.max(octets, longSizeBytes);
final ByteBuffer buf = ByteBuffer.allocate(byteBufferSize);
for (int i = 0; i < byteBufferSize - longSizeBytes; i++) {
buf.put((byte) 0x00);
}
buf.mark();
buf.putLong(value);
// more bytes than long encoding
if (octets >= longSizeBytes) {
return buf.array();
}
// less bytes than long encoding (reset to mark first)
buf.reset();
for (int i = 0; i < longSizeBytes - octets; i++) {
if (buf.get() != 0x00) {
throw new IllegalArgumentException("Value does not fit in " + octets + " octet(s)");
}
}
final byte[] result = new byte[octets];
buf.get(result);
return result;
}
EDIT before storing the string, think of a padding mechanism (spaces would be most used), and character-encoding e.g. String.getBytes(Charset.forName("ASCII")) or "Latin-1". Those are the most common encodings with a single byte per character. Calculating the size of "UTF-8" is slightly more difficult (encode first, add 0x20 valued bytes at the end using ByteBuffer).
You may want to consider having a constant size for each data type. For example, the 32-bit Java int will take up 4 bytes a long will take 8, etc. In fact, if you use Java's DataInputStream and DataOutputStreams, you'll basically be doing that anyway. They have really nice methods like read/writeInt, etc.

How to add two java.lang.Numbers?

I have two Numbers. Eg:
Number a = 2;
Number b = 3;
//Following is an error:
Number c = a + b;
Why arithmetic operations are not supported on Numbers? Anyway how would I add these two numbers in java? (Of course I'm getting them from somewhere and I don't know if they are Integer or float etc).
You say you don't know if your numbers are integer or float... when you use the Number class, the compiler also doesn't know if your numbers are integers, floats or some other thing. As a result, the basic math operators like + and - don't work; the computer wouldn't know how to handle the values.
START EDIT
Based on the discussion, I thought an example might help. Computers store floating point numbers as two parts, a coefficient and an exponent. So, in a theoretical system, 001110 might be broken up as 0011 10, or 32 = 9. But positive integers store numbers as binary, so 001110 could also mean 2 + 4 + 8 = 14. When you use the class Number, you're telling the computer you don't know if the number is a float or an int or what, so it knows it has 001110 but it doesn't know if that means 9 or 14 or some other value.
END EDIT
What you can do is make a little assumption and convert to one of the types to do the math. So you could have
Number c = a.intValue() + b.intValue();
which you might as well turn into
Integer c = a.intValue() + b.intValue();
if you're willing to suffer some rounding error, or
Float c = a.floatValue() + b.floatValue();
if you suspect that you're not dealing with integers and are okay with possible minor precision issues. Or, if you'd rather take a small performance blow instead of that error,
BigDecimal c = new BigDecimal(a.floatValue()).add(new BigDecimal(b.floatValue()));
It would also work to make a method to handle the adding for you. Now I do not know the performance impact this will cause but I assume it will be less than using BigDecimal.
public static Number addNumbers(Number a, Number b) {
if(a instanceof Double || b instanceof Double) {
return a.doubleValue() + b.doubleValue();
} else if(a instanceof Float || b instanceof Float) {
return a.floatValue() + b.floatValue();
} else if(a instanceof Long || b instanceof Long) {
return a.longValue() + b.longValue();
} else {
return a.intValue() + b.intValue();
}
}
The only way to correctly add any two types of java.lang.Number is:
Number a = 2f; // Foat
Number b = 3d; // Double
Number c = new BigDecimal( a.toString() ).add( new BigDecimal( b.toString() ) );
This works even for two arguments with a different number-type. It will (should?) not produce any sideeffects like overflows or loosing precision, as far as the toString() of the number-type does not reduce precision.
java.lang.Number is just the superclass of all wrapper classes of primitive types (see java doc). Use the appropriate primitive type (double, int, etc.) for your purpose, or the respective wrapper class (Double, Integer, etc.).
Consider this:
Number a = 1.5; // Actually Java creates a double and boxes it into a Double object
Number b = 1; // Same here for int -> Integer boxed
// What should the result be? If Number would do implicit casts,
// it would behave different from what Java usually does.
Number c = a + b;
// Now that works, and you know at first glance what that code does.
// Nice explicit casts like you usually use in Java.
// The result is of course again a double that is boxed into a Double object
Number d = a.doubleValue() + (double)b.intValue();
Use the following:
Number c = a.intValue() + b.intValue(); // Number is an object and not a primitive data type.
Or:
int a = 2;
int b = 3;
int c = 2 + 3;
I think there are 2 sides to your question.
Why is operator+ not supported on Number?
Because the Java language spec. does not specify this, and there is no operator overloading. There is also not a compile-time natural way to cast the Number to some fundamental type, and there is no natural add to define for some type of operations.
Why are basic arithmic operations not supported on Number?
(Copied from my comment:)
Not all subclasses can implement this in a way you would expect. Especially with the Atomic types it's hard to define a usefull contract for e.g. add.
Also, a method add would be trouble if you try to add a Long to a Short.
If you know the Type of one number but not the other it is possible to do something like
public Double add(Double value, Number increment) {
return value + Double.parseDouble(increment.toString());
}
But it can be messy, so be aware of potential loss of accuracy and NumberFormatExceptions
Number is an abstract class which you cannot make an instance of. Provided you have a correct instance of it, you can get number.longValue() or number.intValue() and add them.
First of all, you should be aware that Number is an abstract class. What happens here is that when you create your 2 and 3, they are interpreted as primitives and a subtype is created (I think an Integer) in that case. Because an Integer is a subtype of Number, you can assign the newly created Integer into a Number reference.
However, a number is just an abstraction. It could be integer, it could be floating point, etc., so the semantics of math operations would be ambiguous.
Number does not provide the classic map operations for two reasons:
First, member methods in Java cannot be operators. It's not C++. At best, they could provide an add()
Second, figuring out what type of operation to do when you have two inputs (e.g., a division of a float by an int) is quite tricky.
So instead, it is your responsibility to make the conversion back to the specific primitive type you are interested in it and apply the mathematical operators.
The best answer would be to make util with double dispatch drilling down to most known types (take a look at Smalltalk addtition implementation)

Categories