java.util.Random.nextDouble() is slow for me and I need something really fast.
I did some google search and I've found only integers based fast random generators. Is here anything for real numbers from interval <0, 1) ?
If you need something fast and have access to Java8, I can recommend the java.utils SplittableRandom. It is faster (~twice as fast) and has better statistical distribution.
If you need a even faster or better algorithm I can recommend one of these specialized XorShift variants:
XorShift128PlusRandom (faster & better)
XorShift1024StarPhiRandom (similar speed, even longer period)
Information on these algorithms and their quality can be found in this big PRNG comparison.
I made an independent Performance comparison you can find the detailed results and the code here:
Futhermore Apache Commons RNG has a performance test of all their implemented algoritms
Never use java.util.Random, use java.util.SplittableRandom.
If you need faster or better PRNG use a XorShift variant.
You could modify an integer based RNG to output doubles in the interval [0,1) in the following way:
double randDouble = randInt()/(RAND_INT_MAX + 1.0)
However, if randInt() generates a 32-bit integer this won't fill all the bits of the double because double has 53 mantissa bits. You could obviously generate two random integers to fill all mantissa bits. Or you could take a look at the source code of the Ramdom.nextDouble() implementation. It almost surely uses an integer RNG and simply converts the output to a double.
As for performance, the best-performing random number generators are linear congruential generators. Of these, I recommend using the Numerical Recipes generator. You can see more information about LCGs from Wikipedia:
However, if you want good randomness and performance is not that important, I think Mersenne Twister is the best choice. It also has a Wikipedia page:
There is a recent random number generator called PCG, explained in This is essentially a post-processing step for LCG that improves the randomness of the LCG output. Note that PCG is slower than LCG because it is simply a post-processing step for LCG. Thus, if performance is very important and randomness quality not that important, you want to use LCG instead of PCG.
Note that none of the generators I mentioned are cryptographically secure. If you need use the values for cryptographical applications, you should be using a cryptographically secure algorithm. However, I don't really believe that doubles would be used for cryptography.
Note that all these solutions miss a fundamental fact (that I wasn't aware of up to a few weeks ago): passing from 64 bits to a double using a multiplication is a major loss of time. The implementation of xorshift128+ and xorshift1024+ in the DSI utilities ( use direct bit manipulation and the results are impressive.
See the benchmarks for nextDouble() at
and the quality reported at
Imho you should just accept juhist's answer - here's why.
nextDouble is slow because it makes two calls to next() - it's written right there in the documentation.
So your best options are:
use a fast 64 bit generator, convert that to double (MT, PCG, xorshift*, ISAAC64, ...)
generate doubles directly
Here's an overly long benchmark with java's Random, an LCG (as bad as java.util.Random), and Marsaglia's universal generator (the version generating doubles).
import java.util.*;
public class d01 {
private static long sec(double x)
return (long) (x * (1000L*1000*1000));
// ns/op: nanoseconds to generate a double
// loop until it takes a second.
public static double ns_op(Random r)
long nanos = -1;
int n;
for(n = 1; n < 0x12345678; n *= 2) {
long t0 = System.nanoTime();
for(int i = 0; i < n; i++)
nanos = System.nanoTime() - t0;
if(nanos >= sec(1))
if(nanos < sec(0.1))
n *= 4;
return nanos / (double)n;
public static void bench(Random r)
System.out.println(ns_op(r) + " " + r.toString());
public static void main(String[] args)
for(int i = 0; i < 3; i++) {
bench(new Random());
bench(new LCG64(new Random().nextLong()));
bench(new UNI_double(new Random().nextLong()));
// straight from wikipedia
class LCG64 extends java.util.Random {
private long x;
public LCG64(long seed) {
this.x = seed;
public long nextLong() {
x = x * 6364136223846793005L + 1442695040888963407L;
return x;
public double nextDouble(){
return (nextLong() >>> 11) * (1.0/9007199254740992.0);
protected int next(int nbits)
throw new RuntimeException("TODO");
class UNI_double extends java.util.Random {
// Marsaglia's UNIversal random generator extended to double precision
// G. Marsaglia, W.W. Tsang / Statistics & Probability Letters 66 (2004) 183 – 187
private final double[] U = new double[98];
static final double r=9007199254740881.0/9007199254740992.;
static final double d=362436069876.0/9007199254740992.0;
private double c=0.;
private int i=97,j=33;
public double nextDouble(){
double x;
x=U[i]- U[j];
if(--i==0) i=97;
if(--j==0) j=97;
return x+1.;
return x;
//A two-seed function for filling the static array U[98] one bit at a time
void fillU(int seed1, int seed2){
double s,t;
int x,y,i,j;
for (i=1; i<98; i++){
s= 0.0;
for (j=1; j<54; j++){
x=(6969*x) % 65543;
// typo in the paper:
//y=(8888*x) % 65579;
//used forthe demo in the last page of the paper.
y=(8888*y) % 65579;
if(((x^y)& 32)>0)
if(x == 0)
throw new IllegalArgumentException("x");
if(y == 0)
throw new IllegalArgumentException("y");
// Marsaglia's test code is useless because of a typo in fillU():
// x=(6969*x)%65543;
// y=(8888*x)% 65579;
public UNI_double(long seed)
Random r = new Random(seed);
for(;;) {
try {
fillU(r.nextInt(), r.nextInt());
} catch(Exception e) {
// loop again
protected int next(int nbits)
throw new RuntimeException("TODO");
You could create an array of random doubles when you init your program and then just repeat it. This is much faster but the random values reapeat themselfs.
I have a program that takes in anywhere from 20,000 to 500,000 velocity vectors and must output these vectors multiplied by some scalar. The program allows the user to set a variable accuracy, which is basically just how many decimal places to truncate to in the calculations. The program is quite slow at the moment, and I discovered that it's not because of multiplying a lot of numbers, it's because of the method I'm using to truncate floating point values.
I've already looked at several solutions on here for truncating decimals, like this one, and they mostly recommend DecimalFormat. This works great for formatting decimals once or twice to print nice user output, but is far too slow for hundreds of thousands of truncations that need to happen in a few seconds.
What is the most efficient way to truncate a floating-point value to n number of places, keeping execution time at utmost priority? I do not care whatsoever about resource usage, convention, or use of external libraries. Just whatever gets the job done the fastest.
EDIT: Sorry, I guess I should have been more clear. Here's a very simplified version of what I'm trying to illustrate:
import java.util.*;
import java.lang.*;
import java.text.DecimalFormat;
import java.math.RoundingMode;
public class MyClass {
static class Vector{
float x, y, z;
public String toString(){
return "[" + x + ", " + y + ", " + z + "]";
public static ArrayList<Vector> generateRandomVecs(){
ArrayList<Vector> vecs = new ArrayList<>();
Random rand = new Random();
for(int i = 0; i < 500000; i++){
Vector v = new Vector();
v.x = rand.nextFloat() * 10;
v.y = rand.nextFloat() * 10;
v.z = rand.nextFloat() * 10;
return vecs;
public static void main(String args[]) {
int precision = 2;
float scalarToMultiplyBy = 4.0f;
ArrayList<Vector> velocities = generateRandomVecs();
System.out.println("First 10 raw vectors:");
for(int i = 0; i < 10; i++){
System.out.print(velocities.get(i) + " ");
This is the code that I am concerned about
DecimalFormat df = new DecimalFormat("##.##");
long start = System.currentTimeMillis();
for(Vector v : velocities){
/* Highly inefficient way of truncating*/
v.x = Float.parseFloat(df.format(v.x * scalarToMultiplyBy));
v.y = Float.parseFloat(df.format(v.y * scalarToMultiplyBy));
v.z = Float.parseFloat(df.format(v.z * scalarToMultiplyBy));
long finish = System.currentTimeMillis();
long timeElapsed = finish - start;
System.out.println("Runtime: " + timeElapsed + " ms");
System.out.println("First 10 multiplied and truncated vectors:");
for(int i = 0; i < 10; i++){
System.out.print(velocities.get(i) + " ");
The reason it is very important to do this is because a different part of the program will store trigonometric values in a lookup table. The lookup table will be generated to n places beforehand, so any velocity vector that has a float value to 7 places (i.e. 5.2387471) must be truncated to n places before lookup. Truncation is needed instead of rounding because in the context of this program, it is OK if a vector is slightly less than its true value, but not greater.
Lookup table for 2 decimal places:
8.03 -> -0.17511085919
8.04 -> -0.18494742685
8.05 -> -0.19476549993
8.06 -> -0.20456409661
8.07 -> -0.21434223706
Say I wanted to look up the cosines of each element in the vector {8.040844, 8.05813164, 8.065688} in the table above. Obviously, I can't look up these values directly, but I can look up {8.04, 8.05, 8.06} in the table.
What I need is a very fast method to go from {8.040844, 8.05813164, 8.065688} to {8.04, 8.05, 8.06}
The fastest way, which will introduce rounding error, is going to be to multiply by 10^n, call Math.rint, and to divide by 10^n.
That's...not really all that helpful, though, considering the introduced error, and -- more importantly -- that it doesn't actually buy anything. Why drop decimal points if it doesn't improve efficiency or anything? If it's about making the values shorter for display or the like, truncate then, but until then, your program will run as fast as possible if you just use full float precision.
To find nth fibonacci number using memoization I found one code which uses map in c++.
I have tried to convert this code in java but it fails .
code in c++:
#include <bits/stdc++.h>
typedef long long int ll;
map<ll, ll> mp;
ll M = 1000000007;
long long fibonacci(long long n) {
if (mp.count(n))return mp[n];
long long k=n/2;
if (n%2==0) {
return mp[n] = fibonacci(k)*(fibonacci(k+1)+fibonacci(k-1)) % M;
} else {
return mp[n] = (fibonacci(k+1)*fibonacci(k+1) + fibonacci(k)*fibonacci(k)) % M;
int main()
ll t;
I have tried same code in java using HashMap.
code in java:
static HashMap<Long,Long> hm=new HashMap<Long,Long>();
static long f(long n) {
if (hm.containsKey(n)) return hm.get(n);
long k=n/2;
if (n%2==0) {
return hm.put(n,f(k)*(f(k+1)+f(k-1)) % M);
} else {
return hm.put(n, (f(k+1)*f(k+1) + f(k)*f(k)) % M);
public static void main(String[] args) throws IOException {
long b=f(2L);
but this code in java gives StackOverflowError.
I have tried this code using LinkedHashMap and TreeMap in java both gives same error.
Which class I have to use which works same as map in c++?
Please someone explain how map work in c++.
look at the output of code in java and c++
c++: c++ code
java: java code
To memorize all the possible fibonacci numbers which fit into a long you can use a simple array.
static final int[] FIB = new int[100_000_000];
static final intM = 1000000007;
static {
long start = System.currentTimeMillis();
FIB[1] = FIB[2] = 1;
for (int i = 3; i < FIB.length; i++) {
int l = FIB[i - 1] + FIB[i - 2];
while (l >= M)
l -= M;
FIB[i] = l;
long time = System.currentTimeMillis() - start;
System.out.printf("Took %.3f seconds to build table of %,d fibonacci values%n", time/1e3, FIB.length);
public static long fibonacci(int n) {
return FIB[n];
public static void main(String[] args) {
Took 0.648 seconds to build table of 100,000,000 fibonacci values
This would use 400 MB of memory for the array which is more efficient than any map implementation.
A StackOverflowError happens when you have too many methods calls stacked, it's thrown by the virtual machine. It's not a HashMap problem at all.
From the docs:
Thrown when a stack overflow occurs because an application recurses
too deeply.
You could either increase the stack size of your JVM by using the -Xss flag, or you could try to use a better algorithm, or review this one to check if it's really equivalent to the c++ version... But either way, I think you're just overcomplicating this, there are simpler ways of getting the same result.
You can also check this question on how a recursive fibonacci method looks like.
EDIT: Check this link, it shows how you can get the nth number using memoization and Java.
Also, check this question, there are lots of answers with different methods on how to get a large nth fibonacci number.
Another way of doing this
Use a List<Long> as a cache.
private static List<Long> cache = new ArrayList<Long>();
* Java Program to calculate Fibonacci numbers with memorization
* This is quite fast as compared to previous Fibonacci function
* especially for calculating factorial of large numbers.
public static int improvedFibo(int number){
Integer fibonacci = cache.get(number);
if(fibonacci != null){
return fibonacci; //fibonacci number from cache
//fibonacci number not in cache, calculating it
fibonacci = fibonacci2(number);
//putting fibonacci number in cache for future request
cache.put(number, fibonacci);
return fibonacci;
Taken from here.
You can also check this question for another example.
I want to create a unique number of "Long" type using java. I have seen few examples but they were using timestamp, without using timestamp can i create a unique number of wrapper object "Long" .Please suggest.
please suggest.Thanks.
Generate each digit by calling random.nextInt. For uniqueness, you can keep track of the random numbers you have used so far by keeping them in a set and checking if the set contains the number you generate each time.
public static long generateRandom(int length) {
Random random = new Random();
char[] digits = new char[length];
digits[0] = (char) (random.nextInt(9) + '1');
for (int i = 1; i < length; i++) {
digits[i] = (char) (random.nextInt(10) + '0');
return Long.parseLong(new String(digits));
Without using timestamp, you have these options:
Keep a record of all previously generated numbers -- of course you have to store them somewhere, which is unwieldy
Store the previous number, and increment each time.
Simply assume that the PRNG will never come up with the same number twice. Since there are 2^64 == 1.8 * 10^19 possible values, this is a very safe bet.
Many of the answers suggest using Math.random() to generate the unique id. Now Math.random() is actually not random at all, and does in itself not add anything unique. The seemingly uniqueness comes from the default seeding in the Math.random() based on System.currentTimeMillis(); with the following code:
* Construct a random generator with the current time of day in milliseconds
* as the initial state.
* #see #setSeed
public Random() {
setSeed(System.currentTimeMillis() + hashCode());
So why not just remove the Math.Random() from the equation and just use System.currentTimeMillis() in the counter.
Time based unique numbers:
The following code implements a unique number generator based solemnly on time. The benefit of this is that you don't need to store any counters etc. The numbers generated will be unique under the following condition: The code only runs in one JVM at any time periode - this is important, as the timestamp is part of the key.
public class UniqueNumber {
private static UniqueNumber instance = null;
private long currentCounter;
private UniqueNumber() {
currentCounter = (System.currentTimeMillis() + 1) << 20;
private static synchronized UniqueNumber getInstance() {
if (instance == null) {
instance = new UniqueNumber();
return instance;
private synchronized long nextNumber() {
while (currentCounter > (System.currentTimeMillis() << 20)) {
try {
} catch (InterruptedException e) {
return currentCounter;
static long getUniqueNumber() {
return getInstance().nextNumber();
The code allows for up to 2^20 numbers to be generated per millisecond (provided you have access to that fast hardware). If this rate is exceeded the code will sleep until next tick of System.currentTimeMillis()
Testing the code:
public static void main(String[] args) {
for (int i = 0; i < 10; i++) {
Take a look on this Commons Id, it has LongGenerator that generates an incrementing number as a Long object.
This will create simply a random long number -
You can generate random numbers using java.util.Random and add them to a java.util.Set this will ensure that no duplicate is allowed
Try with UUID as:
Long uniqueLong = UUID.randomUUID().getMostSignificantBits();
Here, you find a very good explanation as to why this could be unique in terms of randomness.
I am trying to build an OCR by calculating the Coefficient Correlation between characters extracted from an image with every character I have pre-stored in a database. My implementation is based on Java and pre-stored characters are loaded into an ArrayList upon the beginning of the application, i.e.
ArrayList<byte []> storedCharacters, extractedCharacters;
storedCharacters = load_all_characters_from_database();
extractedCharacters = extract_characters_from_image();
// Calculate the coefficent between every extracted character
// and every character in database.
double maxCorr = -1;
for(byte [] extractedCharacter : extractedCharacters)
for(byte [] storedCharacter : storedCharactes)
corr = findCorrelation(extractedCharacter, storedCharacter)
if (corr > maxCorr)
maxCorr = corr;
public double findCorrelation(byte [] extractedCharacter, byte [] storedCharacter)
double mag1, mag2, corr = 0;
for(int i=0; i < extractedCharacter.length; i++)
mag1 += extractedCharacter[i] * extractedCharacter[i];
mag2 += storedCharacter[i] * storedCharacter[i];
corr += extractedCharacter[i] * storedCharacter[i];
} // for
corr /= Math.sqrt(mag1*mag2);
return corr;
The number of extractedCharacters are around 100-150 per image but the database has 15600 stored binary characters. Checking the coefficient correlation between every extracted character and every stored character has an impact on the performance as it needs around 15-20 seconds to complete for every image, with an Intel i5 CPU.
Is there a way to improve the speed of this program, or suggesting another path of building this bringing similar results. (The results produced by comparing every character with such a large dataset is quite good).
Thank you in advance
public static void run() {
ArrayList<byte []> storedCharacters, extractedCharacters;
storedCharacters = load_all_characters_from_database();
extractedCharacters = extract_characters_from_image();
// Calculate the coefficent between every extracted character
// and every character in database.
computeNorms(charComps, extractedCharacters);
double maxCorr = -1;
for(byte [] extractedCharacter : extractedCharacters)
for(byte [] storedCharacter : storedCharactes)
corr = findCorrelation(extractedCharacter, storedCharacter)
if (corr > maxCorr)
maxCorr = corr;
private static double[] storedNorms;
private static double[] extractedNorms;
// Correlation between to binary images
public static double findCorrelation(byte[] arr1, byte[] arr2, int strCharIndex, int extCharNo){
final int dotProduct = dotProduct(arr1, arr2);
final double corr = dotProduct * storedNorms[strCharIndex] * extractedNorms[extCharNo];
return corr;
public static void computeNorms(ArrayList<byte[]> storedCharacters, ArrayList<byte[]> extractedCharacters) {
storedNorms = computeInvNorms(storedCharacters);
extractedNorms = computeInvNorms(extractedCharacters);
private static double[] computeInvNorms(List<byte []> a) {
final double[] result = new double[a.size()];
for (int i=0; i < result.length; ++i)
result[i] = 1 / Math.sqrt(dotProduct(a.get(i), a.get(i)));
return result;
private static int dotProduct(byte[] arr1, byte[] arr2) {
int dotProduct = 0;
for(int i = 0; i< arr1.length; i++)
dotProduct += arr1[i] * arr2[i];
return dotProduct;
Nowadays, it's hard to find a CPU with a single core (even in mobiles). As the tasks are nicely separated, you can do it with a few lines only. So I'd go for it, though the gain is limited.
In case you really mean cross-correlation, then a transform like DFT or DCT could help. They surely do for big images, but with yours 12x16, I'm not sure.
Maybe you mean just a dot product? And maybe you should tell us?
Note that you actually don't need to compute the correlation, most of the time you only need is find out if it's bigger than a threshold:
corr = findCorrelation(extractedCharacter, storedCharacter)
..... more code to check if this is the best match ......
This may lead to some optimizations or not, depending on how the images look like.
Note also that a simple low level optimization can give you nearly a factor of 4 as in this question of mine. Maybe you really should tell us what you're doing?
I guess that due to the computation of three products in the loop, there's enough instruction level parallelism, so a manual loop unrolling like in my above question is not necessary.
However, I see that those three products get computed some 100 * 15600 times, while only one of them depends on both extractedCharacter and storedCharacter. So you can compute
100 + 15600 + 100 * 15600
dot products instead of
3 * 100 * 15600
This way you may get a factor of three pretty easily.
Or not. After this step there's a single sum computed in the relevant step and the problem linked above applies. And so does its solution (unrolling manually).
Factor 5.2
While byte[] is nicely compact, the computation involves extending them to ints, which costs some time as my benchmark shows. Converting the byte[]s to int[]s before all the correlations gets computed saves time. Even better is to make use of the fact that this conversion for storedCharacters can be done beforehand.
Manual loop unrolling twice helps but unrolling more doesn't.
I have a set of integer ranges, which represent lower and upper bounds of classes. For example:
0..500 xsmall
500..1000 small
1000..1500 medium
1500..2500 large
In my case there can be over 500 classes. These classes do not overlap, but they can differ in size.
I can implement finding the matching range as a simple linear search through a list, for example
class Range
int lower;
int upper;
String category;
boolean contains(int val)
return lower <= val && val < upper;
public String getMatchingCategory(int val)
for (Range r : listOfRanges)
if (r.contains(val))
return r.category;
return null;
However, this seems slow; as I need on average N/2 look-ups. If the classes were equally sized, I could use division. Is there a standard technique to find the correct range faster?
What you are looking for is a SortedMap and its methods tailMap and firstKey. Check out the documentation for full details.
The advantage of this approach over plain arrays is in the ease of maintaining your ranges: you can insert/remove new boundaries at any point with almost no runtime cost; with arrays it means copying both parallel arrays in full.
I've written code for both variants and benchmarked it:
public class BinarySearch
static final int ARRAY_SIZE = 128, INCREMENT = 1000;
static final int[] arrayK = new int[ARRAY_SIZE];
static final String[] arrayV = new String[ARRAY_SIZE];
static final SortedMap<Integer,String> map = new TreeMap<>();
static {
for (int i = 0, j = 0; i < arrayK.length; i++) {
arrayK[i] = j; arrayV[i] = String.valueOf(j);
map.put(j, String.valueOf(j));
final Random rnd = new Random();
int rndInt;
#Setup(Level.Invocation) public void nextInt() {
rndInt = rnd.nextInt((ARRAY_SIZE-1)*INCREMENT);
public String array() {
final int i = Arrays.binarySearch(arrayK, rndInt);
return arrayV[i >= 0? i : -(i+1)];
public String sortedMap() {
return map.tailMap(rndInt).values().iterator().next();
Benchmark results:
Benchmark Mode Thr Cnt Sec Mean Mean error Units
array thrpt 1 5 5 10.948 0.033 ops/usec
sortedMap thrpt 1 5 5 5.752 0.070 ops/usec
Interpretation: array search is only twice as fast and this factor is quite stable across array sizes. In the presented code the array size is 1024 and the factor is 1.9. I've also tested with array size 128, where the factor is 2.05.
Here, Arrays.binarySearch is your friend. Simply put all the boundaries in and handle the possible cases. Assuming you ranges leave no holes between them, you only need to put the upper bounds in.
For you example
0..500 xsmall
500..1000 small
1000..1500 medium
1500..2500 large
you'd use
int[] boundaries = {500, 1000, 1500, 2500};
and look up the input. Handle the two cases (found/not found) and you're done. Forget about ranges, they're nice but they don't fit you problem.
I also wrote a benchmark and no matter how I try I'd lose my bet as the ratio is about 3 rather than 5. The strange things like S001024 in my results stand for the size 1024.