generate normal distributed timestamps within a range [0,x]

generate normal distributed timestamps within a range [0,x] - java

I want to generate a file containing timestamps (integers between 0 and a bound value x, in increasing order) which represents arrivals of an event.
The "Event arrival rate" should be "normal distributed" which means, somehow in the "middle" of the dataset the rate of arrivals should be more frequently as at the beginning and the end.
How can i generate such a list of values using java?
regards

I agree with greedybuddha that a Gaussian function is what you want here, but you also stated that you want your events to be ordered - Random.nextGaussian() won't give you that, it will give you random numbers that are normally distributed. Instead, use the gaussian function to calculate the frequency of events at each point in time:
for (int t = 0; t < max; t++)
{
f = Math.exp(-Math.pow(t - CENTER, 2.0) / (2.0 * Math.pow(WIDTH, 2.0)));
for (int j = 0; j < f; j++)
{
writeEvent(t);
}
}
CENTER is where you want the "peak" of the curve to be (probably max/2), and WIDTH is a parameter that controls the spread of the distribution.

Java has a Random class and one of it's methods is a nextGaussian which will give you a normal distribution from 0-1.0 (Gaussian Distribution and Normal Distribution are synonyms).
From there you just need to multiply that by your range to get a value from that range.
Random random = new Random();
public int nextNormalTime(int upperTimeBound){
return (int)(random.nextGaussian()*upperTimeBound);
}
If you want to create an ordered list of these, you can just add the times into a list and sort, or into something like a PriorityQueue.
List<Integer> list = new ArrayList<Integer>(nTimes);
for (int i=0;i<nTimes;i++){
list.add(nextNormalTime(upperTimeBound));
}
Collections.sort(list);

Related

How can I put a certain amount of integers in random locations in a 2d array?

I'm trying to put a certain amount of integers into an array in random spots without putting them in the same place.
My combine method concatenates two given integers and returns the Int.
Places is an arrayList to keep the locations of the integers already put into the array.
The random method returns a random integer in between the two given ints.
The combine method works and so does the random method, but I'm not sure why it isn't working.
public void fillOne(int b)
{
for(int x = 0; x < b; x++)
{
int kachow = random(0, 5);
int kachigga = random(0, 5);
int skrrt = combine(kachow, kachigga);
if(notInArray(skrrt))
{
locations[kachow][kachigga] = 1;
places.add(skrrt);
}
}
}

You haven't really explained what isn't working. But an obvious flaw in your algorithm is that it isn't guaranteed to set b elements to 1. If your algorithm generates a duplicate position then it will set fewer than b elements.
Your logic for storing the combined positions is overly complex. One solution would be to reverse the logic: generate a single integer representing both dimensions then divide it into two when you are setting the location. That makes the checks a lot simpler.
For example, if your array is 5x5:
Set<Integer> positions = new HashSet<>();
while (positions.size() < n)
positions.add(random.nextInt(25));
for (int p: positions)
locations[p/5][p%5] = 1;
Because positions is a set it automatically excludes duplicates which means the code will keep adding random positions until their are n distinct positions in the set.
Even simpler, if you are using Java 8:
random.ints(0, 25)
.distinct().limit(n)
.forEach(p -> locations[p/5][p%5] = 1);

How to efficiently generate a set of unique random numbers with a predefined distribution?

I have a map of items with some probability distribution:
Map<SingleObjectiveItem, Double> itemsDistribution;
Given a certain m I have to generate a Set of m elements sampled from the above distribution.
As of now I was using the naive way of doing it:
while(mySet.size < m)
mySet.add(getNextSample(itemsDistribution));
The getNextSample(...) method fetches an object from the distribution as per its probability. Now, as m increases the performance severely suffers. For m = 500 and itemsDistribution.size() = 1000 elements, there is too much thrashing and the function remains in the while loop for too long. Generate 1000 such sets and you have an application that crawls.
Is there a more efficient way to generate a unique set of random numbers with a "predefined" distribution? Most collection shuffling techniques and the like are uniformly random. What would be a good way to address this?
UPDATE: The loop will call getNextSample(...) "at least" 1 + 2 + 3 + ... + m = m(m+1)/2 times. That is in the first run we'll definitely get a sample for the set. The 2nd iteration, it may be called at least twice and so on. If getNextSample is sequential in nature, i.e., goes through the entire cumulative distribution to find the sample, then the run time complexity of the loop is at least: n*m(m+1)/2, 'n' is the number of elements in the distribution. If m = cn; 0<c<=1 then the loop is at least Sigma(n^3). And that too is the lower bound!
If we replace sequential search by binary search, the complexity would be at least Sigma(log n * n^2). Efficient but may not be by a large margin.
Also, removing from the distribution is not possible since I call the above loop k times, to generate k such sets. These sets are part of a randomized 'schedule' of items. Hence a 'set' of items.

Start out by generating a number of random points in two dimentions.
Then apply your distribution
Now find all entries within the distribution and pick the x coordinates, and you have your random numbers with the requested distribution like this:

The problem is unlikely to be the loop you show:
Let n be the size of the distribution, and I be the number of invocations to getNextSample. We have I = sum_i(C_i), where C_i is the number of invocations to getNextSample while the set has size i. To find E[C_i], observe that C_i is the inter-arrival time of a poisson process with λ = 1 - i / n, and therefore exponentially distributed with λ. Therefore, E[C_i] = 1 / λ = therefore E[C_i] = 1 / (1 - i / n) <= 1 / (1 - m / n). Therefore, E[I] < m / (1 - m / n).
That is, sampling a set of size m = n/2 will take, on average, less than 2m = n invocations of getNextSample. If that is "slow" and "crawls", it is likely because getNextSample is slow. This is actually unsurprising, given the unsuitable way the distrubution is passed to the method (because the method will, of necessity, have to iterate over the entire distribution to find a random element).
The following should be faster (if m < 0.8 n)
class Distribution<T> {
private double[] cummulativeWeight;
private T[] item;
private double totalWeight;
Distribution(Map<T, Double> probabilityMap) {
int i = 0;
cummulativeWeight = new double[probabilityMap.size()];
item = (T[]) new Object[probabilityMap.size()];
for (Map.Entry<T, Double> entry : probabilityMap.entrySet()) {
item[i] = entry.getKey();
totalWeight += entry.getValue();
cummulativeWeight[i] = totalWeight;
i++;
}
}
T randomItem() {
double weight = Math.random() * totalWeight;
int index = Arrays.binarySearch(cummulativeWeight, weight);
if (index < 0) {
index = -index - 1;
}
return item[index];
}
Set<T> randomSubset(int size) {
Set<T> set = new HashSet<>();
while(set.size() < size) {
set.add(randomItem());
}
return set;
}
}
public class Test {
public static void main(String[] args) {
int max = 1_000_000;
HashMap<Integer, Double> probabilities = new HashMap<>();
for (int i = 0; i < max; i++) {
probabilities.put(i, (double) i);
}
Distribution<Integer> d = new Distribution<>(probabilities);
Set<Integer> set = d.randomSubset(max / 2);
//System.out.println(set);
}
}
The expected runtime is O(m / (1 - m / n) * log n). On my computer, a subset of size 500_000 of a set of 1_000_000 is computed in about 3 seconds.
As we can see, the expected runtime approaches infinity as m approaches n. If that is a problem (i.e. m > 0.9 n), the following more complex approach should work better:
Set<T> randomSubset(int size) {
Set<T> set = new HashSet<>();
while(set.size() < size) {
T randomItem = randomItem();
remove(randomItem); // removes the item from the distribution
set.add(randomItem);
}
return set;
}
To efficiently implement remove requires a different representation for the distribution, for instance a binary tree where each node stores the total weight of the subtree whose root it is.
But that is rather complicated, so I wouldn't go that route if m is known to be significantly smaller than n.

If you are not concerning with randomness properties too much then I do it like this:
create buffer for pseudo-random numbers
double buff[MAX]; // [edit1] double pseudo random numbers
MAX is size should be big enough ... 1024*128 for example
type can be any (float,int,DWORD...)
fill buffer with numbers
you have range of numbers x = < x0,x1 > and probability function probability(x) defined by your probability distribution so do this:
for (i=0,x=x0;x<=x1;x+=stepx)
for (j=0,n=probability(x)*MAX,q=0.1*stepx/n;j<n;j++,i++) // [edit1] unique pseudo-random numbers
buff[i]=x+(double(i)*q); // [edit1] ...
The stepx is your accuracy for items (for integral types = 1) now the buff[] array has the same distribution as you need but it is not pseudo-random. Also you should add check if j is not >= MAX to avoid array overruns and also at the end the real size of buff[] is j (can be less than MAX due to rounding)
shuffle buff[]
do just few loops of swap buff[i] and buff[j] where i is the loop variable and j is pseudo-random <0-MAX)
write your pseudo-random function
it just return number from the buffer. At first call returns the buff[0] at second buff[1] and so on ... For standard generators When you hit the end of buff[] then shuffle buff[] again and start from buff[0] again. But as you need unique numbers then you can not reach the end of buffer so so set MAX to be big enough for your task otherwise uniqueness will not be assured.
[Notes]
MAX should be big enough to store the whole distribution you want. If it is not big enough then items with low probability can be missing completely.
[edit1] - tweaked answer a little to match the question needs (pointed by meriton thanks)
PS. complexity of initialization is O(N) and for get number is O(1).

You should implement your own random number generator (using a MonteCarlo methode or any good uniform generator like mersen twister) and basing on the inversion method (here).
For example : exponential law: generate a uniform random number u in [0,1] then your random variable of the exponential law would be : ln(1-u)/(-lambda) lambda being the exponential law parameter and ln the natural logarithm.
Hope it'll help ;).

I think you have two problems:
Your itemDistribution doesn't know you need a set, so when the set you are building gets
large you will pick a lot of elements that are already in the set. If you start with the
set all full and remove elements you will run into the same problem for very small sets.
Is there a reason why you don't remove the element from the itemDistribution after you
picked it? Then you wouldn't pick the same element twice?
The choice of datastructure for itemDistribution looks suspicious to me. You want the
getNextSample operation to be fast. Doesn't the map from values to probability force you
to iterate through large parts of the map for each getNextSample. I'm no good at
statistics but couldn't you represent the itemDistribution the other way, like a map from
probability, or maybe the sum of all smaller probabilities + probability to a element
of the set?

Your performance depends on how your getNextSample function works. If you have to iterate over all probabilities when you pick the next item, it might be slow.
A good way to pick several unique random items from a list is to first shuffle the list and then pop items off the list. You can shuffle the list once with the given distribution. From then on, picking your m items ist just popping the list.
Here's an implementation of a probabilistic shuffle:
List<Item> prob_shuffle(Map<Item, int> dist)
{
int n = dist.length;
List<Item> a = dist.keys();
int psum = 0;
int i, j;
for (i in dist) psum += dist[i];
for (i = 0; i < n; i++) {
int ip = rand(psum); // 0 <= ip < psum
int jp = 0;
for (j = i; j < n; j++) {
jp += dist[a[j]];
if (ip < jp) break;
}
psum -= dist[a[j]];
Item tmp = a[i];
a[i] = a[j];
a[j] = tmp;
}
return a;
}
This in not Java, but pseudocude after an implementation in C, so please take it with a grain of salt. The idea is to append items to the shuffled area by continuously picking items from the unshuffled area.
Here, I used integer probabilities. (The proabilities don't have to add to a special value, it's just "bigger is better".) You can use floating-point numbers but because of inaccuracies, you might end up going beyond the array when picking an item. You should use item n - 1 then. If you add that saftey net, you could even have items with zero probability that always get picked last.
There might be a method to speed up the picking loop, but I don't really see how. The swapping renders any precalculations useless.

Accumulate your probabilities in a table
Probability
Item Actual Accumulated
Item1 0.10 0.10
Item2 0.30 0.40
Item3 0.15 0.55
Item4 0.20 0.75
Item5 0.25 1.00
Make a random number between 0.0 and 1.0 and do a binary search for the first item with a sum that is greater than your generated number. This item would have been chosen with the desired probability.

Ebbe's method is called rejection sampling.
I sometimes use a simple method, using an inverse cumulative distribution function, which is a function that maps a number X between 0 and 1 onto the Y axis.
Then you just generate a uniformly distributed random number between 0 and 1, and apply the function to it.
That function is also called the "quantile function".
For example, suppose you want to generate a normally distributed random number.
It's cumulative distribution function is called Phi.
The inverse of that is called probit.
There are many ways to generate normal variates, and this is just one example.
You can easily construct an approximate cumulative distribution function for any univariate distribution you like, in the form of a table.
Then you can just invert it by table-lookup and interpolation.

Generating Random integers within a range to meet a percentile in java

I am trying to generate random integers within a range to sample a percentile of that range. For example: for range 1 to 100 I would like to select a random sample of 20%. This would result in 20 integers randomly selected for 100.
This is to solve an extremely complex issue and I will post solutions once I get this and a few bugs worked out. I have not used many math packages in java so I appreciate your assistance.
Thanks!

Put all numbers in a arraylist, then shuffle it. Take only the 20 first element of the arraylist:
ArrayList<Integer> randomNumbers = new ArrayList<Integer>();
for(int i = 0; i < 100; i++){
randomNumbers.add((int)(Math.random() * 100 + 1));
}
Collections.shuffle(randomNumbers);
//Then the first 20 elements are your sample

If you want 20 random integers between 1 and one hundred, use Math.random() to generate a value between 0 and 0.999... Then, manipulate this value to fit your range.
int[] random = new int[20];
for(int i =0; i< random.length;i++)
{
random[i] = (int)(Math.random()*100+1);
}
When you multiply Math.random() by 100, you get a value between 0 and 99.999... To this number you add 1, yielding a value between 1.0 and 100.0. Then, I typecasted the number to an integer by using the (int) typecast. This gives a number between 1 and 100 inclusive. Then, store the values into an array.

If you are willing to go with Java 8, you could use some features of lambdas. Presuming that you aren't keeping 20% of petabytes of data, you could do something like this (number is the number of integers in the range to get) it isn't efficient in the slightest, but it works, and is fun if you'd like to do some Java 8. But if this is performance critical, I wouldn't recommend it:
public ArrayList<Integer> sampler(int min, int max, int number){
Random random = new Random();
ArrayList<Integer> generated = new ArrayList<Integer>();
IntStream ints = random.ints(min,max);
Iterator<Integer> it = ints.iterator();
for(int i = 0; i < number; i++){
int k = it.next();
while(generated.contains(k)){
k = it.next();
}
generated.add(k);
}
ints.close();
return generated;
}

If you really need to scale to petabytes of data, you're going to need a solution that doesn't require keeping all your numbers in memory. Even a bit-set, which would compress your numbers to 1 byte per 8 integers, wouldn't fit in memory.
Since you didn't mention the numbers had to be shuffled (just random), you can start counting and randomly decide whether to keep each number or not. Then stream your result to a file or wherever you need it.
Start with this:
long range = 100;
float percentile = 0.20f;
Random rnd = new Random();
for (long i=1; i < range; i++) {
if (rnd.nextFloat() < percentile) {
System.out.println(i);
}
}
You will get about 20 percent of the numbers from 1 to 100, with no duplicates.
As the range goes up, the accuracy will too, so you really wouldn't need any special logic for large data sets.
If an exact number is needed, you would need special logic for smaller data sets, but that's pretty easy to solve using other methods posted here (although I'd still recommend a bit set).

Java 2D array - input values

Im implementing 2 algorithms for the TSP which uses a class which includes the routes, their cost, etc. At the minute it uses random values which is fine, although I now need to compare the algorithms so to make this fair I need to make the inputs the same (which is obviously unlikely to happen when using random inputs!) The issue im having is I dont know how to change it from random values to inserting pre-determined values into the 2D array, not just that but I also dont know how to calculate the costs of these values.
Randomly generates node values:
Random rand = new Random();
for (int i=0; i<nodes; i++) {
for (int j=i; j<nodes; j++) {
if (i == j)
Matrix[i][j] = 0;
else {
Matrix[i][j] = rand.nextInt(max_distance);
Matrix[j][i] = Matrix[i][j];
}
}
}
Im assuming for the above a declare a matrix of say [4][4] and then int matrix [][] = insert values ?
I do not help with some other sections of this class but I think I need to make sure this part is right before asking anymore!
Thanks a lot in advance!

you can do initialization of 2D array like this:
double matrix[][] = { { v1, v2, ..., vn }, { x1, x2, ..., xn }, ..., { y1, y2, ..., yn } };
where each inner {} represents the outter (first) index and each inner element represents the innermost (second) intex.
Example: to acess element x1 you do this:
matrix[1][0];
This is the answer that you asked for, but I still think that it's better to use the same set of random values for both algorithms, Jon Taylor showed a good way for doing that. The code to set the seed looks like this:
int seed = INTEGER_VALUE;
Random rand = new Random(seed);
this way you will ever get the same set of values.

You could set a seed instead for each random number generator therefore guaranteeing that for each implementation you test, the same sequence of pseudo-random numbers is being created.
This would save the effort of manually entering lots of values.
Edit to show seed method:
Random r = new Random(56);
Every time r is created with the seed of 56 it will produce the exact same sequence of random numbers. Without a seed I believe the seed is defaulted to the system time (giving the illusion of truly random numbers).

Creating an even amount of randomness in an array

Let's say that you have an arbitrarily large sized two-dimensional array with an even amount of items in it. Let's also assume for clarity that you can only choose between two things to put as a given item in the array. How would you go about putting a random choice at a given index in the array but once the array is filled you have an even split among the two choices?
If there are any answers with code, Java is preferred but other languages are fine as well.

You could basically think about it in the opposite way. Rather than deciding for a given index, which value to put in it, you could select n/2 elements from the array and place the first value in them. Then place the 2nd value in the other n/2.

A 2-D A[M,N] array can be mapped to a vector V[M*N] (you can use a row-major or a column-major order to do the mapping).
Start with a vector V[M*N]. Fill its first half with the first choice, and the second half of the array with the second choice object. Run a Fisher-Yates shuffle, and convert the shuffled array to a 2-D array. The array is now filled with elements that are evenly split among the two choices, and the choices at each particular index are random.

The below creates a List<T> the size of the area of the matrix, and fills it half with the first choice (spaces[0]) and half with the second (spaces[1]). Afterward, it applies a shuffle (namely Fisher-Yates, via Collections.shuffle) and begins to fill the matrix with these values.
static <T> void fill(final T[][] matrix, final T... space) {
final int w = matrix.length;
final int h = matrix[0].length;
final int area = w * h;
final List<T> sample = new ArrayList<T>(area);
final int half = area >> 1;
sample.addAll(Collections.nCopies(half, space[0]));
sample.addAll(Collections.nCopies(half, space[1]));
Collections.shuffle(sample);
final Iterator<T> cursor = sample.iterator();
for (int x = w - 1; x >= 0; --x) {
final T[] column = matrix[x];
for (int y = h - 1; y >= 0; --y) {
column[y] = cursor.next();
}
}
}

Pseudo-code:
int trues_remaining = size / 2;
int falses_remaining = size / 2;
while (trues_remaining + falses_remaining > 0)
{
if (trues_remaining > 0)
{
if (falses_remaining > 0)
array.push(getRandomBool());
else
array.push(true);
}
else
array.push(false);
}
Doesn't really scale to more than two values, though. How about:
assoc_array = { 1 = 4, 2 = 4, 3 = 4, 4 = 4 };
while (! assoc_array.isEmpty())
{
int index = rand(assoc_array.getNumberOfKeys());
int n = assoc_array.getKeyAtIndex(index);
array.push(n);
assoc_array[n]--;
if (assoc_array[n] <= 0) assoc_array.deleteKey(n);
}
EDIT: just noticed you asked for a two-dimensional array. Well it should be easy to adapt this approach to n-dimensional.
EDIT2: from your comment above, "school yard pick" is a great name for this.

It doesn't sound like your requirements for randomness are very strict, but I thought I'd contribute some more thoughts for anyone who may benefit from them.
You're basically asking for a pseudorandom binary sequence, and the most popular one I know of is the maximum length sequence. This uses a register of n bits along with a linear feedback shift register to define a periodic series of 1's and 0's that has a perfectly flat frequency spectrum. At least it is perfectly flat within certain bounds, determined by the sequence's period (2^n-1 bits).
What does that mean? Basically it means that the sequence is guaranteed to be maximally random across all shifts (and therefore frequencies) if its full length is used. When compared to an equal length sequence of numbers generated from a random number generator, it will contain MORE randomness per length than your typical randomly generated sequence.
It is for this reason that it is used to determine impulse functions in white noise analysis of systems, especially when experiment time is valuable and higher order cross effects are less important. Because the sequence is random relative to all shifts of itself, its auto-correlation is a perfect delta function (aside from qualifiers indicated above) so the stimulus does not contaminate the cross correlation between stimulus and response.
I don't really know what your application for this matrix is, but if it simply needs to "appear" random then this would do that very effectively. In terms of being balanced, 1's vs 0's, the sequence is guaranteed to have exactly one more 1 than 0. Therefore if you're trying to create a grid of 2^n, you would be guaranteed to get the correct result by tacking a 0 onto the end.
So an m-sequence is more random than anything you'll generate using a random number generator and it has a defined number of 0's and 1's. However, it doesn't allow for unqualified generation of 2d matrices of arbitrary size - only those where the total number of elements in the grid is a power of 2.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.