Assume you have 2 positive long values a and b which are greater than 2^32 (and smaller than 2^63), and an long integer c.
What is the best way, in java and\or c, to perform operations such as
(a*b)%c
while avoiding arithmetic overflow.
Edit :
c is around 2^34, and sometimes both a and b are just between 2^32 and c...
I finally avoided using BigInteger for the specific situation I was in. Indeed, It was possible to know one divisor of both a and b (not always the case), so I would use the arithmetic properties of modulo to my advantage.
Assuming everything's positive, then you can use the following mathematical identity:
(a*b)%c == ((a%c) * (b%c)) % c
Of course, this still doesn't eliminate the possibility of overflow.
The easiest way to completely avoid the issue is to use a big-integer library.
You can go even further than what #Oli Charlesworth suggests in his (good) answer. You can decompose in factors a and b (not necessary in all the prime factors, a partial decomposition might be enough) and perform the modulus in any intermediate result of the multiplication. Although this is likely to be more costly than going for the bignum, as it will involve quite a few divisions and they are expensive.
In Java I would use BigInteger:
BigInteger bd = BigInteger.valueOf(2).pow(33);
System.out.println(bd.multiply(bd).remainder(BigInteger.valueOf(2).pow(34).add(BigInteger.valueOf(1))));
To the best of my knowledge, there's no way to solve your problem without higher precision arithmetics, and at least LLVM's optimizer agrees.
If 128-bit math is not available natively, you'll need to use a general-purpose big integer library, or take the bits you need from a less general implementation like Math128 from GnuCash.
Related
I learned that Clojure reader interprets decimal literal with suffix 'M', like 1.23M, as BigDecimal. And I also know that decimal numbers with no 'M' become Java double.
But I think it would be better that normal decimal number is BigDecimal, and host-dependent decimal has suffix, like 1.23H. So when the number is corrupted or truncated because of the precision limit of IEEE double, we can easily notice that the number is precision-limited. Also, I think easier expression should be host-independent.
Is there any reason that Clojure interprets literal decimal as Java double, other than time performance? Also, I don't think time performance is an answer, because it's not C/C++, and other way to declare host-dependent decimal can be implemented just like '1.23H'.
Once up on a time, for integers, Clojure would auto-promote to larger sizes when needed. This was changed so that overflow exceptions are thrown. My sense, from afar was that:
The powers that be meant for Clojure to be a practical language doing practical things in a practical amount of time. They didn't want performance to blow up because number operations were unexpectedly using arbitrary precision libraries instead of CPU integer operations. Contrast to scheme that seems to prioritize mathematical niceness over practicality.
People did not like being surprised at run time when inter-op calls would fail because the Java library expected a 32 bit integer instead of an arbitrary sized integer.
So it was decided that the default was to use normal integers (I think Java longs?) and only use arbitrarily large integers when the programmer called for it, when the programmer knowingly decided that they were willing to take the performance hit, and the inter-op hit.
My guess is similar decisions where made for numbers with decimal points.
Performance could be one thing. Perhaps clojure.core developers could chime in regarding the reasons.
I personally think it is not so much of a big deal not to have bigdecimal by default, since :
there a literal for that as you point out : M
there are operations like +', *', -'... (note the quote) that "support arbitrary precision".
I'm working on some tool that gets to compute numbers that can get close to 1e-25 in the worst cases, and compare them together, in Java. I'm obviously using double precision.
I have read in another answer that I shouldn't expect more than 1e-15 to 1e-17 precision, and this other question deals with getting better precision when ordering operations in a "better" order.
Which double precision operations are more keen to loose precision along the way? Should I try to work with number as big as possible or as small as possible? Do divisions first before multiplications?
I'd rather not use the BigDecimal classes or equivalent, as the code is already slow enough ;) (unless they don't impact speed too much, of course).
Any information will be greatly appreciated!
EDIT: The fact that numbers are "small" in absolute value (1e-25) does not matter, as double can go down to 1e-324. But what matters is that, when they are very similar (both in 1e-25), I have to compare, let's say 4.64563824048517606458e-21 to 4.64563824048517606472e-21 (difference is the 19th and 20th digits). When computing these numbers, the difference is so small that I might hit the "rounding error", where remainder is filled with random numbers.
The question is: "how to order computation so that this loss of precision is minimized?". It might be doing divisions before multiplications, or additions first.
If it is important to get the correct answer, you should use BigDecimal. It is slower than double, but for most cases it is fast enough. I can't think of a lot of cases where you do a lot of calculations with such small numbers where it does not matter if the answer is correct - at least with Java.
If this is a super performance sensitive application, I would consider using a different language.
Thanks to #John for pointing out a very complete article about floating point arithmetics.
It turns out that, when precision is needed, operations should be re-ordered, and formulas adapted to avoid loss of precision, as explained in the Cancellation chapter: when comparing numbers that are very close to each other (which is my case), "catastrophic cancellation" may occur, inducing a huge loss of precision. Often, re-writing the formula, or re-ordering operations according to your à-priori knowledge of the operands values can lead to achieving greater accuracy in calculus.
What I'll remember from this article is:
be careful when substracting two nearly-identical quantities
try to re-arrange operations to avoid catastrophic cancellation
For the latter case, remember that computing (x - y) * (x + y) gives more accurate results than x * x - y * y.
how do algorithms with double compete compared to int values? Is there much difference, or is it neglectable?
In my case, I have a canvas that uses Integers so far. But now as I'm implementing a scaling, I'm probably going to switch everything to Double. Would this have a big impact on calculations?
If so, would maybe rounding doubles to only a few fractions optimize performance?
Or am I totally on the path of over-optimization and should just use doubles without any headache?
You're in GWT, so ultimately your code will be JavaScript, and JavaScript has a single type for numeric data: Number, which corresponds to Java's Double.
Using integers in GWT can either mean (I have no idea what the GWT compiler exactly does, it might also be dependent on context, such as crossing JSNI boundaries) that the generated code is doing more work than with doubles (doing a narrowing conversion of numbers to integer values), or that the code won't change at all.
All in all, expect the same or slightly better performance using doubles (unless you have to later do conversions to integers, of course); but generally speaking you're over-optimizing (also: optimization needs measurement/metrics; if you don't have them, you're on the “premature optimization” path)
There is a sizable difference between integers and doubles, however generally doubles are also very fast.
The difference is that integers are still faster than doubles, because it takes very few clock cycles to do arithmetic operations on integers.
Doubles are also fast because they are generally natively supported by a floating-point unit, which means that it is calculated by dedicated hardware. Unfortunately, it generally is usually 2x to many 40x slower.
Having said this, the CPU will usually spend quite a bit of time on housekeeping like loops and function calls, so if it is fast enough with integers, most of the time (perhaps even 99% of the time), it will be fast enough with doubles.
The only time floating point numbers are orders of magnitude slower is when they must be emulated, because there is no hardware support. This generally only occurs on embedded platforms, or where uncommon floating point types are used (eg. 128-bit floats, or decimal floats).
The result of some benchmarks can be found at:
http://pastebin.com/Kx8WGUfg
https://stackoverflow.com/a/2550851/1578925
but generally,
32-bit platforms have a greater disparity between doubles and integers
integers are always at least twice as fast on adding and subtracting
If you are going to change the type integer to double in your program you must also have to rewrite those lines of code that comparing two integers. Like a and b are two integers and you campare if ( a == b) so after changing a, b type to double you also have to change this line and have to use compare method of the double.
Not knowing the exact needs of your program, my instinct is that you're over-optimizing. When choosing between using ints or doubles, you usually base the decision on what type of value you need over which will run faster. If you need floating point values that allow for (not necessarily precise) decimal values, go for doubles. If you need precise integer values, go for ints.
A couple more points:
Rounding your doubles to certain fractions should have no impact on performance. In fact, the overhead required to round them in the first place would probably have a negative impact.
While I would argue not to worry about the performance differences between int and double, there is a significant difference between int and Integer. While an int is a primitive data type that can be used efficiently, an Integer is an object that essentially just holds an int. This incurs a significant overhead. Integers are useful in that they can be stored in collections like Vectors while ints cannot, but in all other cases its best to use ints.
In general maths that naturally fits as an integer will be faster than maths that naturally fits as a double, BUT trying to force double maths to work as an integer is almost always slower, moving back and forth between the two costs more than the speed boost you get.
If you're considering something like:
I only want 1 decimal places within my 'quazi integer float' so i'll just multiply everything by 10;
5.5*6.5
so 5.5 --> 55 and
so 6.5 --> 65
with a special multiplying function
public int specialIntegerMultiply(int a, int b){
return a*b/10;
}
Then for the love of god don't, it'll probably be slower with all the extra overhead and it'll be really confusing to write.
p.s. rounding the doubles will make no difference at all as the remaining decimal places will still exist, they'll just all be 0 (in decimal that is, in binary that won't even be true).
I looked into this stackoverflow question relating to Big Integer and specifically I do not understand this line (the words in italics):
In the BigInteger class, I have no limits and there are some helpful
functions there but it is pretty depressing to convert your beautiful
code to work with the BigInteger class, specially when primitive
operators don't work there and you must use functions from this class.
I don't know what I am missing but to represent something that has no limit you would require infinite memory ? Whats is the trick here ?
There is no theoretical limit. The BigInteger class allocates as much memory as it needs for all the bits of data it is asked to hold.
There are, however, some practical limits, dictated by the memory available. And there are further technical limits, although you're very unlikely to be affected: some methods assume that the bits are addressable by int indexes, so things will start to break when you go above Integer.MAX_VALUE bits.
Graham gave great answer to this question. I would like only to add that you have to be carefull with valueOf method because it is created using long parameter so the maximum value is Long.MAX_VALUE.
Yes its used when we need very big numbers with arbitrary precision. It's important to note that "arbitrary" precision or number of digits does not mean "unlimited": it means that the number of digits in a number or number of digits of precision in a calculation is limited by memory and/or defined limits to precision that we specify.
Look at the BigInteger class source code, you will see (it can be done with NetBean). A number will be represented as an int arrays. Example, 10113 will be [1, 0, 1, 1, 3] (this is not exactly what the BigInteger class does, just an example how big number module work). So, technically, its only limit will be your memory.
I need to extract 'thousands' from an integer (e.g. 1345 -> 1; 24378 -> 24) and since my application is going to do this a lot, I need to do this efficiently.
Of course, division by 1000 is always an option, but division is an expensive operation so I'm looking for something more efficient.
The target platform is Android and as far as I know most Android platforms today do not have a math co-processor, so the most preferred way would be to do this by bit-wise manipulations, though I can't figure out how to do that and if its possible at all...
A math coprocessor will be a great assistance with floating point operations but just about every modern CPU will have efficient integer division.
Seriously, just do the divide operation, it will almost certainly be the best option. While bit shifting is good for division by powers of two (like 1024), it's not so good otherwise.
int always takes non-decimal answer.so use
int i=24378/1000;
24378/1000=24.378
but int will take 24