Java/Python MD5 implementation- how to overcome unsigned 32-bit requirement? - java

I'm attempting to implement MD5 (for curiosity's sake) in Python/Java, and am effectively translating the wikipedia MD5 page's pseudocode into either language. First, I used Java, only to encounter frustration with its negative/positive integer overflow (because unsigned ints aren't an option, for-all integer,-2147483648 <= integer <= 2147483647). I then employed Python, after deciding that it's better suited for heavy numerical computation, but realized that I wouldn't be able to overcome the unsigned 32-bit integer requirement, either (as Python immediately casts wrapped ints to longs).
Is there any way to hack around Java/Python's lack of unsigned 32-bit integers, which are required by the aforementioned MD5 pseudocode?

Since all the operations are bitwise operations, they wouldn't suffer from sign extension (which would cause you problems), except for right shift.
Java has a >>> operator for this purpose.

As a note beforehand - I don't know if this is a good solution, but it appears to give the behaviour you want.
Using the ctypes module, you can access the underlying low-level data-type directly, and hence have an unsigned int in Python.
Specifically, ctypes.c_uint:
>>> i = ctypes.c_uint(0)
>>> i.value -= 1
>>> i
c_uint(4294967295)
>>> i.value += 1
>>> i
c_uint(0)
This is arguably abuse of the module - it's designed for using C code easily from within Python, but as I say, it appears to work. The only real downside I can think of is that I assume ctypes is CPython specific.

Related

Exact mixed comparison BigInteger and double in Java

Just noticed that Python and JavaScript have exact comparison.
Example in Python:
>>> 2**1023+1 > 8.98846567431158E307
True
>>> 2**1023-1 < 8.98846567431158E307
True
And JavaScript:
> 2n**1023n+1n > 8.98846567431158E307
true
> 2n**1023n-1n < 8.98846567431158E307
true
Anything similar available for Java, except converting both arguments to
BigDecimal?
Preliminary answer, i.e. verbal solution sketch:
I am skeptical about a solution that would convert
to BigDecimal, since this conversion results in
a shift of the base from base=2 to base=10. As soon
as the exponent of the Java double floating point value
is different from the binary precision, this leads to additional digits and lengthy
pow() operations, which one can verify by inspecting
some open source BigDecimal(double)
constructor implementation.
One can get the mantissa via Double.doubleToRawLongBits(d).
If the Java double floating point value is not a sub-normal
all that needs to be done is (raw & DOUBLE_SNIF_MASK) +
(DOUBLE_SNIF_MASK+1) where 0x000fffffffffffffL This means
the integer Java primitive type long should be enough to
carry the mantissa. The challenge is now to perform a comparison
taking the exponent of the float also into account.
But I must admit I didn't have time yet to work
out some Java code. I have also in mind some
optimizations using bigLength() of the other
argument, which is in this setting a BigInteger.
The use of bitLength() would speed up the comparison.
A simple heuristic can implement a fast path, so that
the mantissa can be ignored. Already the exponent
of the double and the bitLength() of the BigInteger
give enough information for a comparison result.
As soon as I have time and a prototype running, I might
publish some Java code fragment here. But maybe somebody
faced the problem already. My general hypothesis is that a fast or
even ultra fast routine is possible. But I didn't have
much time to search the internet and to find an
implementation, thats why I defered the problem to
stack overflow, maybe somebody else had the same
problem as well and/or might point to a complete solution?

operations on unsigned numbers in java using least memory

as we know Java doesn't support unsigned operations because it considers the last bit as a sign bit for all integers.
I know that one way is to use a greater size defined for the numbers involving in the operations. for example if you want to have 32-bit operations, you can do them by long which is 64-bit, then consider the bottom 32 bits as the result.
but the point is that, it takes twofold memory.
I read something about using the class integer but i didn't understand it. Do you have any idea to do unsigned operations with the least memory used?
thanks!
"signed" and "unsigned" is about the interpretation of bit patterns as numeric values. And most (all?) modern CPUs (and the JVM) use Two's Complement representation for signed numbers, having the interesting property that for addition and subtraction, the bit pattern operations are the same as for unsigned numbers.
E.g. In 16-bit, the pattern 0xfff0 is 65520 when interpreted as an unsigned, and -16 as signed number. Subtracting 16 (0x0010) from that number gives 0xffe0, which represents the correct result both for the signed (being -32) as well as the unsigned case (65504).
So, depending on the operations you are doing on your numbers, there are cases where you can simply ignore the signed/unsigned question - just be careful with input and output.
Regarding the relevance of using the shorter representation, only you can tell. And Java can be quite fast when doing numerics. I remember a coding contest quite some years ago where even Java 1.4 outperformed Microsoft C (both with default settings - with optimization switches C was a bit faster).
As we know Java doesn't support unsigned operations because it considers the last bit as a sign bit for all integers.
This is incorrect.
While Java does not support 32 and 64 bit unsigned integer types, it DOES support unsigned integer operations; see the javadoc for the Integer and Long types, and look for the xxxUnsigned methods.
Note that you won't see methods for add, subtract or multiply because they would return the same representations as the primitive operators do.
In short, you can use the 32 bit signed int type to do 32 bit unsigned arithmetic and comparison operations.
For more information:
Java8 unsigned arithmetic
How to use the unsigned Integer in Java 8 and Java 9?
Unsigned long in Java
https://www.baeldung.com/java-unsigned-arithmetic

x86 80-bit floating point type in Java

I want to emulate the x86 extended precision type and perform arithmetic operations and casts to other types in Java.
I could try to implement it using BigDecimal, but covering all the special cases around NaNs, infinity, and casts would probably a tedious task. I am aware of some libraries that provide other floating types with a higher precision than double, but I want to have the same precision as the x86 80-bit float.
Is there a Java library that provides such a floating point type? If not, can you provide other hints that would allow to implement such a data type with less effort than coming up with a custom BigDecimal solution?
If you know that your Java code will actually run on an x86 processor, implement the 80-bit arithmetic in assembly (or C, if the C compiler supports that) and invoke with JNI.
If you are targeting a particular non-x86 platform, look at qemu code. There should be some way to rip out just the part that does 80-bit float operations. (Edit: qemu's implementation is SoftFloat.). Call it with JNI.
If you truly want cross-platform pure-Java 80-bit arithmetic, you could probably still compare it against the C implementation in open-source CPU emulators to make sure you're addressing the right corner cases.
An 80-bit value should be best held as combination of a long (for the mantissa) and an int for the exponent and sign. For many operations, it will probably be most practical to place the upper and lower halves of the long into separate "long" values, so the code for addition of two numbers with matching signs and exponents would probably be something like:
long resultLo = (num1.mant & 0xFFFFFFFFL)+(num2.mant & 0xFFFFFFFFL);
long resultHi = (num1.mant >>> 32)+(num2.mant >>> 32)+(resultLo >>> 32);
result.exp = num1.exp; // Should match num2.exp
if (resultHi > 0xFFFFFFFFL) {
exponent++;
resultHi = (resultHi + ((resultHi & 2)>>>1)) >>> 1; // Round the result
}
rest.mant = (resultHi << 32) + resultLo;
A bit of a nuisance all around, but not completely unworkable. The key is
to break numbers into pieces small enough that you can do all your math as
type "long".
BTW, note that if one of the numbers did not originally have the same exponent,
it will be necessary to keep track of whether any bits "fell off the end" when
shifting it left or right to match the exponent of the first number, so as to
be able to properly round the result afterward.
This is a bit the reverse of the java strictfp option, that restricts the calculations to 8 bytes where it does 80 bits.
So my answer is run a JVM on a 64 bit machine, maybe in some hypervisor/OS VM, so you have a develop platform.

z3: Modeling Java two's complement overflow and underflow in z3 bit vector addition

I am trying to model Java 32bit int arithmetic. I am trying to use z3 bit vectors to achieve this. I am using the Z3 Java API from the unstable branch.
However, I do not know how to get the right overflow behaviour from z3. I want to model this behaviour:
0b01111111111111111111111111111111 + 1 == 0b10000000000000000000000000000000
Integer.MAX_VALUE + 1 == Integer.MIN_VALUE
I can create a bit vector with the value 0b10000000000000000000000000000000 but when I use BitVecNum.getInt() I get an exception.
I run the following code:
((BitVecNum)
ctx.mkBVAdd(ctx.mkBV(Integer.MAX_VALUE, 32), ctx.mkBV(1, 32))
.simplify()).getInt()
I get the exception com.microsoft.z3.Z3Exception: Numeral is not an int
If I do the following:
((int)((BitVecNum)
ctx.mkBVAdd(ctx.mkBV(Integer.MAX_VALUE, 32), ctx.mkBV(1, 32))
.simplify()).getLong());
I get the result -2147483648 that I want.
Any advice?
This case is a little bit confusing, because Z3 treats all bit-vectors as unsigned, but the getInt/getLong functions look like they would return a signed value. What happens is that MAX_VALUE+1 is correctly computed, resulting in 2147483648 (which is representable in a 32-bit unsigned int), but when getInt is called, it finds that this unsigned value does not fit into a signed int.
This problem stems from the fact that Java does not support unsigned basic types, so the corresponding Z3 functions (like getUInt and getULong) are not available in the Java API. My suggestion would be to always assume bit-vectors are unsigned in Z3 and to use wider datatypes (like the trick via getLong) to get around this issue. This is essentially also what other Java programmers do/suggest, e.g., here and there.

Why would you need unsigned types in Java?

I have often heard complaints against Java for not having unsigned data types. See for example this comment. I would like to know how is this a problem? I have been programming in Java for 10 years more or less and never had issues with it. Occasionally when converting bytes to ints a & 0xFF is needed, but I don't consider that as a problem.
Since unsigned and signed numbers are represented with the same bit values, the only places I can think of where signedness matters are:
When converting the numbers to other bit representation. Between 8, 16 and 32 bit integer types you can use bitmasks if needed.
When converting numbers to decimal format, usually to Strings.
Interoperating with non-Java systems through API's or protocols. Again the data is just bits, so I don't see the problem here.
Using the numbers as memory or other offsets. With 32 bit ints this might be problem for very huge offsets.
Instead I find it easier that I don't need to consider operations between unsigned and signed numbers and the conversions between those. What am I missing? What are the actual benefits of having unsigned types in a programming language and how would having those make Java better?
Occasionally when converting bytes to ints a & 0xFF is needed, but I don't consider that as a problem.
Why not? Is "applying a bitwise AND with 0xFF" actually part of what your code is trying to represent? If not, why should it have to be part of have you write it? I actually find that almost anything I want to do with bytes beyond just copying them from one place to another ends up requiring a mask. I want my code to be cruft-free; the lack of unsigned bytes hampers this :(
Additionally, consider an API which will always return a non-negative value, or only accepts non-negative values. Using an unsigned type allows you to express that clearly, without any need for validation. Personally I think it's a shame that unsigned types aren't used more in .NET, e.g. for things like String.Length, ICollection.Count etc. It's very common for a value to naturally only be non-negative.
Is the lack of unsigned types in Java a fatal flaw? Clearly not. Is it an annoyance? Absolutely.
The comment that you quote hits the nail on the head:
Java's lack of unsigned data types also stands against it. Yes, you can work around it, but it's not ideal and you'll be using code that doesn't really reflect the underlying data correctly.
Suppose you are interoperating with another system, which wants an unsigned 16 bit integer, and you want to represent the number 65535. You claim "the data is just bits, so I don't see the problem here" - but having to pass -1 to mean 65535 is a problem. Any impedance mismatch between the representation of your data and its underlying meaning introduces an extra speedbump when writing, reading and testing the code.
Instead I find it easier that I don't need to consider operations between unsigned and signed numbers and the conversions between those.
The only times you would need to consider those operations is when you were naturally working with values of two different types - one signed and one unsigned. At that point, you absolutely want to have that difference pointed out. With signed types being used to represent naturally unsigned values, you should still be considering the differences, but the fact that you should is hidden from you. Consider:
// This should be considered unsigned - so a value of -1 is "really" 65535
short length = /* some value */;
// This is really signed
short foo = /* some value */;
boolean result = foo < length;
Suppose foo is 100 and length is -1. What's the logical result? The value of length represents 65535, so logically foo is smaller than it. But you'd probably go along with the code above and get the wrong result.
Of course they don't even need to represent different types here. They could both be naturally unsigned values, represented as signed values with negative numbers being logically greater than positive ones. The same error applies, and wouldn't be a problem if you had unsigned types in the language.
You might also want to read this interview with Joshua Bloch (Google cache, as I believe it's gone from java.sun.com now), including:
Ooh, good question... I'm going to say that the strangest thing about the Java platform is that the byte type is signed. I've never heard an explanation for this. It's quite counterintuitive and causes all sorts of errors.
If you like, yes, everything is ones and zeroes. However, your hardware arithmetic and logic unit doesn't work that way. If you want to store your bits in a signed integer value but perform operations that are not natural to signed integers, you will usually waste both storage space and processing time.
An unsigned integer type stores twice as many non-negative values in the same space as the corresponding signed integer type. So if you want to take into Java any data commonly used in a language with unsigned values, such as a POSIX date value (unsigned number of seconds) that is normally used with C, then in general you will need to use a wider integer type than C would use. If you are processing many such values, again you will waste both storage space and fetch-execute time.
The times I have used unsigned data types have been when I read in large blocks of data that correspond to images, or worked with openGL. I personally prefer unsigned if I know something will never be negative, as a "safety feature" of sorts.
Unsigned types are useful for bit-by-bit comparisons, and I'm pretty sure they are used extensively in graphics.

Categories