Consider the case where you want to test every possible input value. Creating a case where you can iterate over all the possible ints is fairly easy, as you can just increment the value by 1 and repeat.
How would you go about doing this same idea for all the possible double values?
You can iterate over all possible long values and then use Double.longBitsToDouble() to get a double for each possible 64-bit combination.
Note however that this will take a while. If you require 100 nanoseconds of processing for each double value it will take roughly (not all bit combinations are different double numbers, e.g. NaN) 2^64*1e-7/86400/365 years which is more than 16e11/86400/365 = 50700 years on a single CPU. Unless you have a datacenter to do the computation, it is a better idea to go over possible range of all input values sampling the interval at a configurable number of points.
Analogous feat for float is still difficult but doable: assuming you need 10 milliseconds of processing for each input value you need roughly 2^32*1e-2/86400 = 497.1 days on a single CPU. You would use Float.intBitsToFloat() in this case.
Java's Double class lets you construct and take apart Double values into its constituent pieces. This, and an understanding of double representation, will allow you at least conceptually to enumerate all possible doubles. You will likely find that there are too many though.
do a loop like:
for (double v = Double.MIN_VALUE; v <= Double.MAX_VALUE; v = Math.nextUp(v)) {
// ...
}
but as already explained in Adam's answer, it will take long to run.
(this will neither create NaN nor Infinity)
Related
There is a very famous and simple problem on hacker-rank that goes as follows:
Given five positive integers, find the minimum and maximum values that can be calculated by summing exactly four of the five integers. Then print the respective minimum and maximum values as a single line of two space-separated long integers.
Example arr = [1,3,5,7,9]
The minimum sum is 1+3+5+7=16 and the maximum sum is 3+5+7+9=24.
Now, i solved this problem as follows:
long max = Collections.max(intList);
long min = Collections.min(intList);
long sum = intList.stream().mapToInt(Integer::intValue).sum();
System.out.println((sum-max)+" "+(sum-min));
It works, but is falling short of 3 test cases. Any suggestions, or improvements that can be done? I am trying to improve my programming skills and this is something that i dont want to let go till i completely understand.
Thanks!
EDIT
Here is the improved code and the answer to anyone who is looking :
long max = Collections.max(arr);
long min = Collections.min(arr);
long sum = arr.stream().mapToLong(Integer::longValue).sum();
System.out.println((sum-max)+" "+(sum-min));
The only problem I see is that you are expected to calculate long result, but are calculating intermediary values (e.g. total sum) in int. This can result in type overflow.
Basically substitute mapToInt with mapToLong and use rather longValue.
PS: Otherwise I like your solution in the sense it is concise and utilizes APIs well. If you are after pixel perfect performance you might want to spare unnecessary cycles over the list, but this is really extreme optimization (as your solution is also linear in complexity) and I doubt it will ever make a difference.
I have to find log and later after few computations antilog of many big decimal numbers. Since log and antilog are not supported for BigDecimal numbers, for this I used Apfloat library and use its pow method which can take both arguments as Apfloat values like below:
ApfloatMath.pow(Constants.BASE_OF_LOG, apFloatNum);
The problem is I am using it in a loop and the loop is big. Apfloat pow takes a lot of time to find power which is more than an hour. To avoid this, I thought of converting Apfloat into double and then using Math.pow which runs fast but gives me infinite for few values.
What should I do? Does anyone know ApfloatMath.pow alternative?
You said you are using Math.pow() now and that some of the calls return an infinite value.
If you can live with using (far less accurate) doubles instead of BigDecimals, then you should think of the fact that mathematically,
x = Math.pow(a, x);
is equivalent to
x = Math.pow(a, x - y) * Math.pow(a, y);
Say you have a big value, let's call it big, then instead of doing:
// pow(a, big) may return infinite
BigDecimal n = BigDecimal.valueOf(Math.pow(a, big));
you can just as well do:
// do this once, outside the loop
BigDecimal large = BigDecimal.valueOf(a).pow(100);
...
// do this inside the loop
// pow(a, big - 100) should not return infinite
BigDecimal n = BigDecimal.valueOf(Math.pow(a, big - 100)).multiply(large);
Instead of 100, you may want to pick another constant that better suits the values you are using. But something like the above could be a simple solution, and much faster than what you describe.
Note
Perhaps ApfloatMath.pow() is only slow for large values. If that is the case, you may be able to apply the principle above to Apfloat.pow() as well. You would only have to do the following only once, outside the loop:
Apfloat large = ApfloatMath.pow(Constants.BASE_OF_LOG, 100);
and then you could use the following inside the loop:
x = ApfloatMath.pow(Constants.BASE_OF_LOG, big - 100).multiply(large);
inside the loop.
But you'll have to test if that makes things faster. I could imagine that ApfloatMath.pow() can be much faster for an integer exponent.
Since I don't know more about your data, and because I don't have Apfloat installed, I can't test this, so you should see if the above solution is good enough for you (especially if it is accurate enough for you), and if it is actually better/faster than what you have.
(EDIT: Looking at where this question started, it really ended up in a much better place. It wound up being a nice resource on the limits of RDD sizes in Spark when set through SparkContext.parallelize() vs. the actual size limits of RDDs. Also uncovered some arguments to parallelize() not found in user docs. Look especially at zero323's comments and his accepted answer.)
Nothing new under the sun but I can't find this question already asked ... the question is about how wrong/inadvisable/improper it might be to run a cast inside a large for loop in Java.
I want to run a for loop to initialize an Arraylist before passing it to a SparkContext.parallelize() method. I have found passing an uninitialized array to Spark can cause an empty collection error.
I have seen many posts about how floats and doubles are bad ideas as counters, I get that, just seems like this is a bad idea too? Like there must be a better way?
numListLen will be 10^6 * 10^3 for now, maybe as large at 10^12 at some point.
List<Double> numList = new ArrayList<Double>(numListLen);
for (long i = 0; i < numListLen; i++) {
numList.add((double) i);
}
I would love to hear where specifically this code falls down and can be improved. I'm a junior-level CS student so I haven't seen all the angles yet haha. Here's a CMU page seemingly approving this approach in C using implicit casting.
Just for background, numList is going to be passed to Spark to tell it how many times to run a simulation and create a RDD with the results, like this:
JavaRDD dataSet = jsc.parallelize(numList,SLICES_AKA_PARTITIONS);
// the function will be applied to each member of dataSet
Double count = dataSet.map(new Function<Double, Double>() {...
(Actually I'd love to run this Arraylist creation through Spark but it doesn't seem to take enough time to warrant that, 5 seconds on my i5 dual-core but if boosted to 10^12 then ... longer )
davidstenberg and Konstantinos Chalkias already covered problems related to using Doubles as counters and radiodef pointed out an issue with creating objects in the loop but at the end of the day you simply cannot allocate ArrayList larger than Integer.MAX_VALUE. On top of that, even with 231 elements, this is a pretty large object and serialization and network traffic can add a substantial overhead to your job.
There a few ways you can handle this:
using SparkContext.range method:
range(start: Long, end: Long,
step: Long = 1, numSlices: Int = defaultParallelism)
initializing RDD using range object. In PySpark you can use or range (xrange in Python 2), in Scala Range:
val rdd = sc.parallelize(1L to Long.MaxValue)
It requires constant memory on the driver and constant network traffic per executor (all you have to transfer it just a beginning and end).
In Java 8 LongStream.range could work the same way but it looks like JavaSparkContext doesn't provide required constructors yet. If you're brave enough to deal with all the singletons and implicits you can use Scala Range directly and if not you can simply write a Java friendly wrapper.
initialize RDD using emptyRDD method / small number of seeds and populate it using mapPartitions(WithIndex) / flatMap. See for example Creating array per Executor in Spark and combine into RDD
With a little bit of creativity you can actually generate an infinite number of elements this way (Spark FlatMap function for huge lists).
given you particular use case you should also take a look at mllib.random.RandomRDDs. It provides number of useful generators from different distributions.
The problem is using a double or float as the loop counter. In your case the loop counter is a long and does not suffer the same problem.
One problem with a double or float as a loop counter is that the floating point precision will leave gaps in the series of numbers represented. It is possible to get to a place within the valid range of a floating point number where adding one falls below the precision of the number being represented (requires 16 digits when the floating point format only supports 15 digits for example). If your loop went through such a point in normal execution it would not increment and continue in an infinite loop.
The other problem with doubles as loop counters is the ability to compare two floating points. Rounding means that to compare the variables successfully you need to look at values within a range. While you might find 1.0000000 == 0.999999999 your computer would not. So rounding might also make you miss the loop termination condition.
Neither of these problems occurs with your long as the loop counter. So enjoy having done it right.
Although I don't recommend the use of floating-point values (either single or double precision) as for-loop counters, in your case, where the step is not a decimal number (you use 1 as a step), everything depends on your largest expected number Vs the fraction part of double representation (52 bits).
Still, double numbers from 2^52..2^53 represent the integer part correctly, but after 2^53, you cannot always achieve integer-part precision.
In practice and because your loop step is 1, you would not experience any problems till 9,007,199,254,740,992 if you used double as counter and thus avoiding casting (you can't avoid boxing though from double to Double).
Perform a simple increment-test; you will see that 9,007,199,254,740,995 is the first false positive!
FYI: for float numbers, you are safe incrementing till 2^24 = 16777216 (in the article you provided, it uses the number 100000001.0f > 16777216 to present the problem).
I'm trying to implement basic 2D vector math functions for a game, in Java. They will be intensively used by the game, so I want them to be as fast as possible.
I started with integers as the vector coordinates because the game needs nothing more precise for the coordinates, but for all calculations I still would have to change to double vectors to get a clear result (eg. intersection between two lines).
Using doubles, there are rounding errors. I could simply ignore them and use something like
d1 - d2 <= 0.0001
to compare the values, but I assume with further calculations the error could sum up until it becomes significant. So I thought I could round them after every possibly unprecise operation, but that turned out to produce much worse results, assumedly because the program also rounds unexact values (eg. 0.33333333... -> 0.3333300...).
Using BigDecimal would be far too slow.
What is the best way to solve this problem?
Inaccurate Method
When you are using numbers that require Precise calculations you need to be sure that you aren't doing something like: (and this is what it seems like you are currently doing)
This will result in the accumulation of rounding errors as the process continues; giving you extremely innacurate data long-term. In the above example, you are actually rounding off the starting float 4 times, each time it becomes more and more inaccurate!
Accurate Method
A better and more accurate way of obtaining numbers is to do this:
This will help you to avoid the accumulation of rounding errors because each calculation is based off of only 1 conversion and the results from that conversion are not compounded into the next calculation.
The best method of attack would be to start at the highest precision that is necessary, then convert on an as-needed basis, but leave the original intact. I would suggest you to follow the process from the second picture that I posted.
I started with integers as the vector coordinates because the game needs nothing more precise for the coordinates, but for all calculations I still would have to change to double vectors to get a clear result (eg. intersection between two lines).
It's important to note that you should not attempt to perform any type of rounding of your values if there is not noticeable impact on your end result; you will simply be doing more work for little to no gain, and may even suffer a performance decrease if done often enough.
This is a minor addition to the prior answer. When converting the float to an integer, it is important to round rather than just casting. In the following program, d is the largest double that is strictly less than 1.0. It could easily arise as the result of a calculation that would have result 1.0 in infinitely precise real number arithmetic.
The simple cast gets result 0. Rounding first gets result 1.
public class Test {
public static void main(String[] args) {
double d = Math.nextDown(1.0);
System.out.println(d);
System.out.println((int)d);
System.out.println((int)Math.round(d));
}
}
Output:
0.9999999999999999
0
1
So I know I can convert a string to a hashcode simply by doing .hashCode(), but is there a way to convert (or use some other function if there is one out there) that will instead of returning an integer return a double between 0 and 1? I was thinking of just dividing the number by the maximum possible integer but wasn't sure if there was a better way.
*Edit (more information about why i'm trying to do this): i'm doing a mathematical operation, and i'm trying to group different objects to perform the same mathematical operation in their group but have a different parameter into the function. each member has a list of characteristics that "group" them... so i was thinking to put the characteristics into a string and then hashcode it and find their group value from that
You couldn't just divide by Integer.MAX_VALUE, as that wouldn't deal with negative numbers. You could use:
private static double INTEGER_RANGE = 1L << 32;
...
// First need to put it in the range [0, INTEGER_RANGE)
double doubleHash = ((long) text.hashCode() - Integer.MIN_VALUE) / INTEGER_RANGE;
That should be okay, as far as I'm aware... but I'm not going to make any claims about the distribution. There may well be a fairly simple way of using the 32 bits to make a unique double (per unique hash code) in the right range, but if you don't care too much about that, this will be simpler.
Dividing it should be ok, but you might loose some "precision" due to rounding problems, etc, that doubles might have.
In general a hash is used to identify something trying to assure it'll be unique, loosing precision might have problems in that.
You could write your own String.hashCodeDouble() returning the desired number, perhaps using a common hash algorithm (let's say, MD5) and adapting it to your required response range.
Example: do the MD5 of the String to get a hash, then simply put a 0. in front of it...
Remember that the .hashCode() is used in lots of functions in Java, you can't simply overwrite it.
This smells bad but might do what you want:
Integer iHash = "123".hashCode();
String sHash = "0."+iHash;
Double dHash = Double.valueOf(sHash);