What are the best resources on learning 'number crunching' using Java ? I am referring to things like correct methods of decimal number processing , best practices , API , notable idioms for performance and common pitfalls ( and their solutions ) while coding for number processing using Java.
This question seems a bit open ended and open to interpretation. As such, I will just give two short things.
1) Decimal precision - never assume that two floating point (or double) numbers are equal, even if you went through the exact same steps to calculate them both. Due to a number of issues with rounding in various situations, you often cannot be certain that a decimal number is exactly what you expect. If you do double myNumber = calculateMyNumber() and then do a bunch of things and then come back to it and check if(myNumber == calculateMyNumber(), that evaluation could be false even if you have not changed the calculations done in calculateMyNumber()
2) There are limitations in the size and precision of numbers that you can keep track of. If you have int myNumber = 2000000000 and if(myNumber*2 < myNumber), that will actually evaluate to true, as myNumber*2 will result in a number less than myNumber, because the memory allocated for the number isn't big enough to hold a number that large and it will overflow, becoming smaller than it was before. Look into classes that encapsulate large numbers, such as BigInteger and BigDecimal.
You will figure stuff like this out as a side effect if you study the computer representations of numbers, or binary representations of numbers.
First, you should learn about floating point math. This is not specific to java, but it will allow you to make informed decisions later about, for example, when it's OK to use Java primitives such as float and double. Relevant topics include (copied from a course that I took on scientific computing):
Sources of error: roundoff, truncation error, incomplete convergence, statistical error,
program bug.
Computer floating point arithmetic and the IEEE standard.
Error amplification through cancellation.
Conditioning, condition number, and error amplification.
This leads you to decisions about whether to use Java's BigDecimal, BigInteger, etc. There are lots of questions and answers about this already.
Next, you're going to hit performance, including both CPU and memory. You probably will find various rules of thumb, such as "autoboxing a lot is a serious performance problem." But as always, the best thing to do is profile your own application. For example, let's say somebody insists that some optimization is important, even though it affects legibility. If that optimization doesn't matter for your application, then don't do it!
Finally, where possible, it often pays to use a library for numerical stuff. Your first attempt probably will be slower and more buggy than existing libraries. For example, for goodness sake, don't implement your own linear programming routine.
Related
I'm about to start optimizations on an enormous piece of code and I need to know exactly which operations are performed when the modulus operator is used. I have been searching for quite a while, but I can't find anything about the machine code behind it. Any ideas?
If you need to know
exactly which operations are performed when the modulus operator is used
then I would suggest you are "doing it wrong".
Modulus may be different depending on OS and underlying architecture. It may vary or it may not, but if you need to rely on the implementation, it is likely that your time could best be spent elsewhere. The implementation is not guaranteed to stay the same, or to be consistent across different machines.
Why do you believe modulus to be a major source of computation? Regardless of its implementation, the operation is very likely to be a constant - i.e, if it is operating within an algorithm which has big-O greater than constant time, optimize the algorithm first.
Ask yourself why you need to optimize. Is the computation taking (significantly) longer than expected?
Then ask yourself where 90 - 99% of the computation is being spent. Try using a profiler to get numbers, even if you think you know where time is being spent. It may give you a clue or shed light on a bug.
The modulus operator on integers is built in on most platforms. An instruction with the timing comparable to that of division is performed, producing the modulus.
The compiler can perform an optimization for divisors that are powers of two: instead of performing modulos of, say, x % 512, the compiler can use a potentially faster x & 0x01FF.
Any ideas?
Yes, don't waste your time on it. There's going to be other bits of code you can improve far more than trying to beat the compiler at it's own job
I'm trying to write a program that solves for the reduced row echelon form when given a matrix. Basically what I'm doing is writing a program that solves systems of equations. However, due to the fact that there are times when I need to do division to result in repeating digits (such as 2/3 which is .66666...) and java rounds off to a certain digit, there are times when a pivot should be 0 (meaning no pivot) is something like .0000001 and it messes up my whole program.
My first question is if I were to have some sort of if statement, what is the best way to write something like "if this number is less than .00001 away from being an integer, then round to that closest integer".
My second question is does anyone have any ideas on more optimal ways of handling this situation rather than just put if statements rounding numbers all over the place.
Thank you very much.
You say that you are writing a program that solves systems of equations. This is quite a complicated problem. If you only want to use such a program, you are better off using a library written by somebody else. I will assume that you really want to write the program yourself, for fun and/or education.
You identified the main problem: using floating point numbers leads to rounding and thus to inexact results. There are two solutions for this.
The first solution is not to use floating point numbers. Use only integers and reduce the matrix to row echelon form (not reduced); this can be done without divisions. Since all computations with integers are exact, a pivot that should be 0 will be exactly 0 (actually, there may be a problem with overflow). Of course, this will only work if the matrix you start with consists of integers. You can generalize this approach by working with fractions instead of integers.
The second solution is to use floating point numbers and be very careful. This is a topic of a whole branch of mathematics / computer science called numerical analysis. It is too complicated to explain in an answer here, so you have to get a book on numerical analysis. In simple terms, what you want to do is to say that if Math.abs(pivot) < some small value, then you assume that the pivot should be zero, but that it is something like .0000000001 because of rounding errors, so you just act as if the pivot is zero. The problem is finding out what "some small value" is.
This may seem like a fairly basic question, for which I apologise in advance.
I'm writing an Android app that uses a set of predefined numbers. At the moment I'm working with int values, but at some point I will probably need to use float and double values, too.
The numbers are used for two things. First, I need to display them to the user, for which I need a String (I'm creating a custom View and drawing the String on a Canvas). Second, I need will be using them in a sort of calculator, for which they obviously need to be int (or float/double).
Since the numbers are the same whether they are used as String or int, I only want to store them once (this will also reduce errors if I need to change any of them; I'll only need to change them in the one place).
My question is: should I store them as String or as int? Is it faster to write an int as a String, or to parse an int from a String? My gut tells me that parsing would take more time/resources, so I should store them as ints. Am I right?
Actually, your gut may be wrong (and I emphasise may, see my comments below on measuring). To convert a string to an integer requires a series of multiply/add operations. To convert an integer to a string requires division/modulo. It may well be that the former is faster than the latter.
But I'd like to point out that you should measure, not guess! The landscape is littered with the corpses of algorithms that relied on incorrect assumptions.
I would also like to point out that, unless your calculator is expected to do huge numbers of calculations each second (and I'm talking millions if not billions), the difference will be almost certainly be irrelevant.
In the vast majority of user-interactive applications, 99% of all computer time is spent waiting for the user to do something.
My advice is to do whatever makes your life easier as a developer and worry about performance if (and only if) it becomes an issue. And, just to clarify, I would suggest that storing them in native form (not as strings) would be easiest for a calculator.
I did a test on a 1 000 000 size array of int and String. I only timed the parsing and results says :
Case 1, from int to String : 1 000 000 in an average of 344ms
Case 2, from String to int : 1 000 000 in an average of 140ms
Conclusion, you're guts were wrong :) !
And I join the others saying, this is not what is going to make you're application slow. Better concentrate on making it simpler and safer.
I'd say that's not really relevant. What should matter more is type safety: since you have numbers int (or float and double) would force you to use numbers and not store "arbitrary" data (which String would allow to some extent).
The best is to do a bench test. Write two loops
one that converts 100000 units from numeric to String
one that converts 100000 units from String to numeric
And measure the time elapsed by getting System.currentTimeMillis() before and after each loop.
But personally, if I would need to do calculation on these numbers, I would store them in their native format (int or float) and I would only convert them to String for display. This is more a question of design and maintainability than a question of execution speed. Focusing on execution speed is sometime counterproductive: to gain a few µSec nobody will notice is not worth sacrifying the design and the robustness (of course, some compromise may have to be done when this is a question of saving a lot of CPU time). This reading may interest you.
A human who is using the calculator will not notice a performance difference, but as others have said. Using strings as your internal representation is a bad idea since you don't get type safety in that case.
You will most likely get into maintenance problems later on if you decide to use strings.
It's better design practice to have the view displayed to the user being derived from the underlying data, rather than the other way around - at some point you might decide to render the calculator using your own drawing functions or fixed images, and having your data as strings would be a pain here.
That being said, neither of these operations are particularly time consuming using modern hardware.
Parsing is a slow thing, printing a number is not. The internal representation as number allows you to compute, which is probably what you intend to d with your numbers. Storing numbers as, well, numbers (ints, floats, decimals) also takes up less space than their string representations, so … you'll probably want to go with storing them as ints, floats, or whatever they are.
You are writing an application for mobile devices, where the memory comsumption is a huge deal.
Storing an int is cheap, storing a String is expensive. Go for int.
Edit: more explanation. Storing an int bteween -2^31 and 2^31-1 costs 32 bits. No matter what the number is. Storing it in a String is 16 bits per digit in its base 10 representation.
I'm designing a tool using Java 6, that will read data from Medical Devices.
Each Medical Device manufacturer implements its own Firmware/Protocol. Vendors (like me) write their own interface that
uses the manufacturer's firmware commands to acquire data from the Medical Device. Most firmwares will output data in a cryptic fashion, so the vendor receiving it, is supposed to scale it by doing some calculations on it, in order to figureout the true value.
Its safe to assume that medical data precision is as important as financial data precision etc.
I've come to the conclusion of using BigDecimal to do all numerical calculations and store the final value. I'll be receiving a new set of data almost every second, which means, I'll be doing calculations and updating the same set of values every second. Example:Data coming across from a ventilator for each breath.
Since BigDecimal is immutable, I'm worried about the number of objects generated in the heap every second. Especially since the tool will have to scale up to read data from lets say 50 devices at the same time.
I can increase the heap size and all that, but still here's my questions....
Questions
Is there any mutable cousin of BigDecimal I could use?
Is there any existing opensource framework to do something like this?
Is Java the right language for this kind of functionality?
Should I look into Apfloat? But Apfloat is immutable too. How about JScience?
Any Math library for Java I can use for high precision
I'm aiming for a precision of upto 10 digits only. Dont need more than that. So whats the best library or course of action for this type of precision?
I would recommend that before you jump to the conclusion that BigDecimal doesn't suit your needs, that you actually profile your scenario. It is not a foregone conclusion that immutable nature is going to have a significant impact on your scenario. A modern JVM is very good at allocate and destroying large quantities of objects.
The double primitive type offers approximately 16 digits of decimal precision, why not just use that? Then you won't be touching the heap at all.
The Garbage collector should do a decent enough job of cleaning the objects up, however if you still do want immutable Numbers, you can always access the back-data of BigDecimal using reflections, that way you can create a wrapper class for BigDecimal to do what you want.
If you only need 10 digits of precision you can simply use a double, which has 16 digits of precision.
You say specifically that you need only 10 significant digits. You don't say whether you're considering binary or decimal digits, but standard 64-bit IEEE floating point (Java double) offers 52 binary digits (approximately 16 decimal digits), which sounds like it more than meets your needs.
However, I do recommend that you put some thought into the numerical stability of whatever operations you apply to the input numbers. For example, Math.log() and Math.exp() can have unexpected effects depending on the range of the inputs (in some cases, you might find Math.log1p() and Math.exp1m() to be more appropriate, but again - that depends on the specific operations you're performing).
I am using floats for some Android Java game graphics, but the Math library trig functions all return double, so I have to explicitly cast them.
I understand that floats are quicker to process than doubles, and I do not need high precision answers.
e.g. which is better:
screenXf = (float) (shipXf + offsetXf * Math.sin(headingf) - screenMinXf);
or
screenXf = shipXf + offsetXf * (float) (Math.sin(headingf)) - floatScreenMinXf;
I suppose other questions would be 'how can I test this on an emulator without other factors (e.g. PC services) confusing the issue?' and 'Is it going to be different on different hardware anyway?'
Oh dear, that's three questions. Life is never simple :-(
-Frink
Consider using FloatMath.sin() instead.
FloatMath
Math routines similar to those found in Math. Performs computations on float values directly without incurring the overhead of conversions to and from double.
But note this blurb in the android docs:
http://developer.android.com/guide/practices/design/performance.html#avoidfloat
Designing for Performance
...
In speed terms, there's no difference between float and double on the more modern hardware. Space-wise, double is 2x larger. As with desktop machines, assuming space isn't an issue, you should prefer double to float.
Although this guy #fadden, purportedly one of the guys who wrote the VM, says:
Why are there so many floats in the Android API?
On devices without an FPU, the single-precision floating point ops are much faster than the double-precision equivalents. Because of this, the Android framework provides a FloatMath class that replicates some java.lang.Math functions, but with float arguments instead of double.
On recent Android devices with an FPU, the time required for single- and double-precision operations is about the same, and is significantly faster than the software implementation. (The "Designing for Performance" page was written for the G1, and needs to be updated to reflect various changes.)
His last sentence ("page ... need to be updated") refers to the page I referenced above, so I wonder if he is referring to that sentence about "no difference" that I quoted above.
This is definitely dependent on the HW. I know nothing about the target platforms, but on a current PC it takes the same amount of time while floats were about twice as fast as doubles on a i386.
Unless your emulator can report the cycle count, you can't find it out as the HW of your PC has little in common with the HW of the target platform. When the target platform were your PC, than I'd recommend http://code.google.com/p/caliper/ for this microbenchmark.