Weighted sampling with replacement in Java - java

Is there a function in Java, or in a library such as Apache Commons Math which is equivalent to the MATLAB function randsample?
More specifically, I want to find a function randSample which returns a vector of Independent and Identically Distributed random variables according to the probability distribution which I specify.
For example:
int[] a = randSample(new int[]{0, 1, 2}, 5, new double[]{0.2, 0.3, 0.5})
// { 0 w.p. 0.2
// a[i] = { 1 w.p. 0.3
// { 2 w.p. 0.5
The output is the same as the MATLAB code randsample([0 1 2], 5, true, [0.2 0.3 0.5]) where the true means sampling with replacement.
If such a function does not exist, how do I write one?
Note: I know that a similar question has been asked on Stack Overflow but unfortunately it has not been answered.

I'm pretty sure one doesn't exist, but it's pretty easy to make a function that would produce samples like that. First off, Java does come with a random number generator, specifically one with a function, Random.nextDouble() that can produce random doubles between 0.0 and 1.0.
import java.util.Random;
double someRandomDouble = Random.nextDouble();
// This will be a uniformly distributed
// random variable between 0.0 and 1.0.
If you have sampling with replacement, if you convert the pdf you have as an input into a cdf, you can use the random doubles Java provides to create a random data set by seeing in which part of the cdf it falls. So first you need to convert the pdf into a cdf.
int [] randsample(int[] values, int numsamples,
boolean withReplacement, double [] pdf) {
if(withReplacement) {
double[] cdf = new double[pdf.length];
cdf[0] = pdf[0];
for(int i=1; i<pdf.length; i++) {
cdf[i] = cdf[i-1] + pdf[i];
}
Then you make the properly-sized array of ints to store the result and start finding the random results:
int[] results = new int[numsamples];
for(int i=0; i<numsamples; i++) {
int currentPosition = 0;
while(randomValue > cdf[currentPosition] && currentPosition < cdf.length) {
currentPosition++; //Check the next one.
}
if(currentPosition < cdf.length) { //It worked!
results[i] = values[currentPosition];
} else { //It didn't work.. let's fail gracefully I guess.
results[i] = values[cdf.length-1];
// And assign it the last value.
}
}
//Now we're done and can return the results!
return results;
} else { //Without replacement.
throw new Exception("This is unimplemented!");
}
}
There's some error checking (make sure value array and pdf array are the same size) and some other features you can implement by overloading this to provide the other functions, but hopefully this is enough for you to start. Cheers!

Related

Need help declaring and assigning multidimensional arrays with different numbers of elements in Python (v Java)

I am a long time java user diving into python.
I am searching for a way to create multi-dim arrays that may not have the same number of elements, some or all are NOT known until run time.
In java I'd assign a 3 dimensional array named runSet like this below:
double[][][] runSet = new double[5][][];
int randomInt = genRandInt(); // Returns a random integer between 0 and 101.
for(int i = 0; i < 4; i++) runSet[i] = getRuns(randomInt); // Returns double[randomInt][ToBeDetermined]
// ------------------------------------------------------------------------------
public double[][] getRuns(int num)
{
double[][] array2D = new double[num][];
for(int i = 0; i< num; i++) array2D[i] = genRandArray(); // returns double[] of random values with length between 0 and 1001.
return array2D; // Returns double[num][Varying Length]
} // end GetRuns()
No problems, this works just fine.
BUT using python I want to do the same, and I cannot figure out how to properly assign arrays with unknown number of elements. The best I have been able to come up with is:
import numpy as np
runSet = np.empty([5, ]) # <--- Does NOT throw an error.
randomInt = genRandInt() # Returns a random integer between 0 and whatever.
i = 0
while i < 4:
runSet[i] = getRuns(randomInt) # Returns [][] (double[randomInt][ToBeDetermined] in java speak)
i += 1
#-------------------------------------------------
def getRuns(num)
array2D = np.empty([num, ]) # <-- Does NOT throw an error.
i = 0
while i < num:
array2D[i] = genRandArray(); # Returns double[] of random values with a random LENGTH as well.
i += 1
return array2D; // Returns double[num][Varying Length]
# end GetRuns()
In genRandArray(), python version, I am returning with this (example random) statement:
return [12.4, 6.4, 8.0, 7.9, 8.2, 6.3, 8.8, 3.14, 2.345, 66.828, 12.0]
In both places I assign runSet[i] & array2D[i], I now know I'll get the error kind of like "sequential list being added to single ...."
Question 1): Maybe an array form is better for the return?
Question 2): Am I even close?
I have the feeling I may be going about this the completely wrong way, as the array thinking in python seems totally bass akward from the way I've been trained to think about it in java.
Thanks for any help!

Why introduce a new array in a method? Why not reuse the parameter?

In the method normalize I input an array say, {1, 2, 3, 4}, and the return is a normalized array, {0.1, 0.2, 0.3, 0.4}.
I know I should introduce a new array in the method (b in the following), but why actually?
public static double[] normalize(double[] a) {
double[] b = new double[a.length];
double sum = 0;
for (int i = 0; i < b.length; i++) {
sum += a[i];
}
for (int i = 0; i < b.length; i++) {
b[i] = a[i]/sum;
}
return b;
}
Input: 1 2 3 4
Output: 0.1 0.2 0.3 0.4
Why is the following wrong:
public static double[] normalize(double[] a) {
double sum = 0;
for (int i = 0; i < a.length; i++) {
sum += a[i];
}
for (int i = 0; i < a.length; i++) {
a[i] = a[i]/sum;
}
return a;
}
Input: 1 2 3 4
Output: 0.1 0.2 0.3 0.4
The difference is that one method updates the input array in place, whereas the other one keeps the array intact and returns a copy.
Neither is wrong(*), both ways have their place, you just need to make sure to the caller knows what is happening.
If you update in-place, you can make that clearer by making the method void instead of returning a (redundant) reference to the same array.
(*) In modern practice, prefer immutable data structures. So unless performance is a real issue, go with the "safer" variant.
In the first case, the original array will stay unmodified when the method returns.
In the second case, the original array will be modified (even when it returns to where it was called), irregardless to whatever array you set the return value to afterward.
Both options can be viable, but it depends on the case and what is expected to happen. However, typically I would expect when you pass a parameter to a method, the parameter will not be modified when the method returns.
Documentation can make it clear that the original array will be modified or if it will not be.
Side Note:
This will not work the same way if you modify a directly, instead of modifying the values contained within:
public static void normalize(double[] a) {
a = new double[5];
}
a will remain unchanged in this case when the method returns.
In my opinion, you are creating a new variable with new values. The method receives one var and returns another var.
It's more reader-friendly.
Explaining the byte code behavior, and the difference of creating a new array or reusing the same input is not what you are asking for I think. Because it is really easy to see the difference.
What I think you are looking for, is a conceptual analysis of the difference. From my perspective, "functions" in any language should be treated as math functions. You give X and Y, and the function returns Z. Imagine you have a rectangle which height = a and width = b. If you want the area of it, you pass a and b and you get A . If you want the perimeter, you pass a and b and you get P.
This makes your code more readable, reusable, more cohesive, more testable.

Having issues with an array method

Write a method named createDoubles that builds an array of floating point values that represent the squares of the numbers from 10.0 to 13.0, in steps of 0.5, inclusive of both boundaries. The method has no parameters and returns an array of doubles. There are exactly 7 numbers in this array.
I keep trying nut nothing is working. Thanks!
public static double[] createDoubles(){
double[] dArray= new double[7];
double[] squareArray= new double[dArray.length];
for(int i= 0; i< dArray.length-1; i+=0.5){
squareArray[i]= dArray[i]* dArray[i];
}
return squareArray;
}
If you are incrementing i with 0.5, how do you expect array index to work in loop? Can you please double check the code.
Something like this will work. Idea is that array indexes increment in steps of 1 and not 0.5 as your loop is expecting.
public static double[] createDoubles(){
double[] dArray= {10.0,10.5,11.0,11.5,12.0,12.5,13.0};
double[] squareArray= new double[dArray.length];
for(int i= 0; i< dArray.length; i++){
squareArray[i]= dArray[i]* dArray[i];
}
for(double f: squareArray)
System.out.println(f);
return squareArray;
}
Try the following way:
public static double[] createDoubles(){
double[] squareArray= new double[7];
double number = 10.0;
int i=0;
while(number <= 13.0) {
squareArray[i] = number*number;
number = number + 0.5;
i++;
}
}
return squareArray;
You can see the following demo:
int i = 0;
i = i + 0.5;
The i will always be 0.
That means your method will not stop when you invoke it.
Here's one way to do it, but probably not what the teacher wants:
public static double[] createDoubles() {
double[] answer = {10.0, 10.5, 11.0, 11.5, 12.0, 12.5, 13.0 );
// Actually, the above values should be 100,m ... 169.
// but I'm too laze to do the math
return answer;
}
Note that it is never a good idea to increment a loop index by a non-integer value. You might be surprised if you tried your method for increments of 0.1 instead of 0.5.
The reason this code isn't running but isn't working is because java is rounding down the i+=0.5 to i+=0, which is causing i to remain equal to 0, your loop never to exit, and your code not to do anything at all. To fix this part of your code, change the i+=0.5 to i+=1. However, as you do not declare anything in the square array, the resulting array that you return is nothing but 0s. Fix that by defining whatever you want in the d array, and multiplying those values together to create the values in the array you want to return. Or, you could simply declare put each value into the array like so:
for(int i= 0; i<= dArray.length-1; i+=1){
squareArray[i]= Math.pow(10 + (0.5*i), 2);
}
squareArray[integer+double] is not a valid array index.
Array index are always integer.

Choose best combinations of operators to find target number

I have an array of operations and a target number.
The operations could be
+ 3
- 3
* 4
/ 2
I want to find out how close I can get to the target number by using those operations.
I start from 0 and I need to iterate through the operations in that order, and I can choose to either use the operation or not use it.
So if the target number is 13, I can use + 3 and * 4 to get 12 which is the closest I can get to the target number 13.
I guess I need to compute all possible combinations (I guess the number of calculations is thus 2^n where n is the number of operations).
I have tried to do this in java with
import java.util.*;
public class Instruction {
public static void main(String[] args) {
// create scanner
Scanner sc = new Scanner(System.in);
// number of instructions
int N = sc.nextInt();
// target number
int K = sc.nextInt();
//
String[] instructions = new String[N];
// N instructions follow
for (int i=0; i<N; i++) {
//
instructions[i] = sc.nextLine();
}
//
System.out.println(search(instructions, 0, N, 0, K, 0, K));
}
public static int search(String[] instructions, int index, int length, int progressSoFar, int targetNumber, int bestTarget, int bestDistance) {
//
for (int i=index; i<length; i++) {
// get operator
char operator = instructions[i].charAt(0);
// get number
int number = Integer.parseInt(instructions[i].split("\\s+")[1]);
//
if (operator == '+') {
progressSoFar += number;
} else if (operator == '*') {
progressSoFar *= number;
} else if (operator == '-') {
progressSoFar -= number;
} else if (operator == '/') {
progressSoFar /= number;
}
//
int distance = Math.abs(targetNumber - progressSoFar);
// if the absolute distance between progress so far
// and the target number is less than what we have
// previously accomplished, we update best distance
if (distance < bestDistance) {
bestTarget = progressSoFar;
bestDistance = distance;
}
//
if (true) {
return bestTarget;
} else {
return search(instructions, index + 1, length, progressSoFar, targetNumber, bestTarget, bestDistance);
}
}
}
}
It doesn't work yet, but I guess I'm a little closer to solving my problem. I just don't know how to end my recursion.
But maybe I don't use recursion, but should instead just list all combinations. I just don't know how to do this.
If I, for instance, have 3 operations and I want to compute all combinations, I get the 2^3 combinations
111
110
101
011
000
001
010
100
where 1 indicates that the operation is used and 0 indicates that it is not used.
It should be rather simple to do this and then choose which combination gave the best result (the number closest to the target number), but I don't know how to do this in java.
In pseudocode, you could try brute-force back-tracking, as in:
// ops: list of ops that have not yet been tried out
// target: goal result
// currentOps: list of ops used so far
// best: reference to the best result achieved so far (can be altered; use
// an int[1], for example)
// opsForBest: list of ops used to achieve best result so far
test(ops, target, currentOps, best, opsForBest)
if ops is now empty,
current = evaluate(currentOps)
if current is closer to target than best,
best = current
opsForBest = a copy of currentOps
otherwise,
// try including next op
with the next operator in ops,
test(opsAfterNext, target,
currentOps concatenated with next, best, opsForBest)
// try *not* including next op
test(opsAfterNext, target, currentOps, best, opsForBest)
This is guaranteed to find the best answer. However, it will repeat many operations once and again. You can save some time by avoiding repeat calculations, which can be achieved using a cache of "how does this subexpression evaluate". When you include the cache, you enter the realm of "dynamic programming" (= reusing earlier results in later computation).
Edit: adding a more OO-ish variant
Variant returning the best result, and avoiding the use of that best[] array-of-one. Requires the use of an auxiliary class Answer with fields ops and result.
// ops: list of ops that have not yet been tried out
// target: goal result
// currentOps: list of ops used so far
Answer test(ops, target, currentOps, opsForBest)
if ops is now empty,
return new Answer(currentOps, evaluate(currentOps))
otherwise,
// try including next op
with the next operator in ops,
Answer withOp = test(opsAfterNext, target,
currentOps concatenated with next, best, opsForBest)
// try *not* including next op
Answer withoutOp = test(opsAfterNext, target,
currentOps, best, opsForBest)
if withOp.result closer to target than withoutOp.target,
return withOp
else
return withoutOp
Dynamic programming
If the target value is t, and there are n operations in the list, and the largest absolute value you can create by combining some subsequence of them is k, and the absolute value of the product of all values that appear as an operand of a division operation is d, then there's a simple O(dkn)-time and -space dynamic programming algorithm that determines whether it's possible to compute the value i using some subset of the first j operations and stores this answer (a single bit) in dp[i][j]:
dp[i][j] = dp[i][j-1] || dp[invOp(i, j)][j-1]
where invOp(i, j) computes the inverse of the jth operation on the value i. Note that if the jth operation is a multiplication by, say, x, and i is not divisible by x, then the operation is considered to have no inverse, and the term dp[invOp(i, j)][j-1] is deemed to evaluate to false. All other operations have unique inverses.
To avoid loss-of-precision problems with floating point code, first multiply the original target value t, as well as all operands to addition and subtraction operations, by d. This ensures that any division operation / x we encounter will only ever be applied to a value that is known to be divisible by x. We will essentially be working throughout with integer multiples of 1/d.
Because some operations (namely subtractions and divisions) require solving subproblems for higher target values, we cannot in general calculate dp[i][j] in a bottom-up way. Instead we can use memoisation of the top-down recursion, starting at the (scaled) target value t*d and working outwards in steps of 1 in each direction.
C++ implementation
I've implemented this in C++ at https://ideone.com/hU1Rpq. The "interesting" part is canReach(i, j); the functions preceding this are just plumbing to handle the memoisation table. Specify the inputs on stdin with the target value first, then a space-separated list of operations in which operators immediately preceed their operand values, e.g.
10 +8 +11 /2
or
10 +4000 +5500 /1000
The second example, which should give the same answer (9.5) as the first, seems to be around the ideone (and my) memory limits, although this could be extended somewhat by using long long int instead of int and a 2-bit table for _m[][][] instead of wasting a full byte on each entry.
Exponential worst-case time and space complexity
Note that in general, dk or even just k by itself could be exponential in the size of the input: e.g. if there is an addition, followed by n-1 multiplication operations, each of which involves a number larger than 1. It's not too difficult to compute k exactly via a different DP that simply looks for the largest and smallest numbers reachable using the first i operations for all 1 <= i <= n, but all we really need is an upper bound, and it's easy enough to get a (somewhat loose) one: simply discard the signs of all multiplication operands, convert all - operations to + operations, and then perform all multiplication and addition operations (i.e., ignoring divisions).
There are other optimisations that could be applied, for example dividing through by any common factor.
Here's a Java 8 example, using memoization. I wonder if annealing can be applied...
public class Tester {
public static interface Operation {
public int doOperation(int cur);
}
static Operation ops[] = { // lambdas for the opertions
(x -> x + 3),
(x -> x - 3),
(x -> x * 4),
(x -> x / 2),
};
private static int getTarget(){
return 2;
}
public static void main (String args[]){
int map[];
int val = 0;
int MAX_BITMASK = (1 << ops.length) - 1;//means ops.length < 31 [int overflow]
map = new int[MAX_BITMASK];
map[0] = val;
final int target = getTarget();// To get rid of dead code warning
int closest = val, delta = target < 0? -target: target;
int bestSeq = 0;
if (0 == target) {
System.out.println("Winning sequence: Do nothing");
}
int lastBitMask = 0, opIndex = 0;
int i = 0;
for (i = 1; i < MAX_BITMASK; i++){// brute force algo
val = map[i & lastBitMask]; // get prev memoized value
val = ops[opIndex].doOperation(val); // compute
map[i] = val; //add new memo
//the rest just logic to find the closest
// except the last part
int d = val - target;
d = d < 0? -d: d;
if (d < delta) {
bestSeq = i;
closest = val;
delta = d;
}
if (val == target){ // no point to continue
break;
}
//advance memo mask 0b001 to 0b011 to 0b111, etc.
// as well as the computing operation.
if ((i & (i + 1)) == 0){ // check for 2^n -1
lastBitMask = (lastBitMask << 1) + 1;
opIndex++;
}
}
System.out.println("Winning sequence: " + bestSeq);
System.out.println("Closest to \'" + target + "\' is: " + closest);
}
}
Worth noting, the "winning sequence" is the bit representation (displayed as decimal) of what was used and what wasn't, as the OP has done in the question.
For Those of you coming from Java 7, this is what I was referencing for lambdas: Lambda Expressionsin GUI Applications. So if you're constrained to 7, you can still make this work quite easily.

Finding a mode with decreasing precision

I feel like there should be an available library to more simply do two things, A) Find the mode to an array, in the case of doubles and B) gracefully degrade the precision until you reach a particular frequency.
So imagine an array like this:
double[] a = {1.12, 1.15, 1.13, 2.0, 3.4, 3.44, 4.1, 4.2, 4.3, 4.4};
If I was looking for a frequency of 3 then it would go from 2 decimal positions to 1 decimal, and finally return 1.1 as my mode. If I had a frequency requirement of 4 it would return 4 as my mode.
I do have a set of code that is working the way I want, and returning what I am expecting, but I feel like there should be a more efficient way to accomplish this, or an existing library that would help me do the same. Attached is my code, I'd be interested in thoughts / comments on different approaches I should have taken....I have the iterations listed to limit how far the precision can degrade.
public static double findMode(double[] r, int frequencyReq)
{
double mode = 0d;
int frequency = 0;
int iterations = 4;
HashMap<Double, BigDecimal> counter = new HashMap<Double, BigDecimal>();
while(frequency < frequencyReq && iterations > 0){
String roundFormatString = "#.";
for(int j=0; j<iterations; j++){
roundFormatString += "#";
}
DecimalFormat roundFormat = new DecimalFormat(roundFormatString);
for(int i=0; i<r.length; i++){
double element = Double.valueOf(roundFormat.format(r[i]));
if(!counter.containsKey(element))
counter.put(element, new BigDecimal(0));
counter.put(element,counter.get(element).add(new BigDecimal(1)));
}
for(Double key : counter.keySet()){
if(counter.get(key).compareTo(new BigDecimal(frequency))>0){
mode = key;
frequency = counter.get(key).intValue();
log.debug("key: " + key + " Count: " + counter.get(key));
}
}
iterations--;
}
return mode;
}
Edit
Another way to rephrase the question, per Paulo's comment: the goal is to locate a number where in the neighborhood are at least frequency array elements, with the radius of the neighborhood being as small as possible.
Here a solution to the reformulated question:
The goal is to locate a number where in the neighborhood are at least frequency array elements, with the radius of the neighborhood being as small as possible.
(I took the freedom of switching the order of 1.15 and 1.13 in the input array.)
The basic idea is: We have the input already sorted (i.e. neighboring elements are consecutive), and we know how many elements we want in our neighborhood. So we loop once over this array, measuring the distance between the left element and the element frequency elements more to the right. Between them are frequency elements, so this forms a neighbourhood. Then we simply take the minimum such distance. (My method has a complicated way to return the results, you may want to do it better.)
This is not completely equivalent to your original question (does not work by fixed steps of digits), but maybe this is more what you really want :-)
You'll have to find a better way of formatting the results, though.
package de.fencing_game.paul.examples;
import java.util.Arrays;
/**
* searching of dense points in a distribution.
*
* Inspired by http://stackoverflow.com/questions/5329628/finding-a-mode-with-decreasing-precision.
*/
public class InpreciseMode {
/** our input data, should be sorted ascending. */
private double[] data;
public InpreciseMode(double ... data) {
this.data = data;
}
/**
* searchs the smallest neighbourhood (by diameter) which
* contains at least minSize elements.
*
* #return an array of two arrays:
* { { the middle point of the neighborhood,
* the diameter of the neighborhood },
* all the elements of the neigborhood }
*
* TODO: better return an object of a class encapsuling these.
*/
public double[][] findSmallNeighbourhood(int minSize) {
int currentLeft = -1;
int currentRight = -1;
double currentMinDiameter = Double.POSITIVE_INFINITY;
for(int i = 0; i + minSize-1 < data.length; i++) {
double diameter = data[i+minSize-1] - data[i];
if(diameter < currentMinDiameter) {
currentMinDiameter = diameter;
currentLeft = i;
currentRight = i + minSize-1;
}
}
return
new double[][] {
{
(data[currentRight] + data[currentLeft])/2.0,
currentMinDiameter
},
Arrays.copyOfRange(data, currentLeft, currentRight+1)
};
}
public void printSmallNeighbourhoods() {
for(int frequency = 2; frequency <= data.length; frequency++) {
double[][] found = findSmallNeighbourhood(frequency);
System.out.printf("There are %d elements in %f radius "+
"around %f:%n %s.%n",
frequency, found[0][1]/2, found[0][0],
Arrays.toString(found[1]));
}
}
public static void main(String[] params) {
InpreciseMode m =
new InpreciseMode(1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1,
4.2, 4.3, 4.4);
m.printSmallNeighbourhoods();
}
}
The output is
There are 2 elements in 0,005000 radius around 1,125000:
[1.12, 1.13].
There are 3 elements in 0,015000 radius around 1,135000:
[1.12, 1.13, 1.15].
There are 4 elements in 0,150000 radius around 4,250000:
[4.1, 4.2, 4.3, 4.4].
There are 5 elements in 0,450000 radius around 3,850000:
[3.4, 3.44, 4.1, 4.2, 4.3].
There are 6 elements in 0,500000 radius around 3,900000:
[3.4, 3.44, 4.1, 4.2, 4.3, 4.4].
There are 7 elements in 1,200000 radius around 3,200000:
[2.0, 3.4, 3.44, 4.1, 4.2, 4.3, 4.4].
There are 8 elements in 1,540000 radius around 2,660000:
[1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1, 4.2].
There are 9 elements in 1,590000 radius around 2,710000:
[1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1, 4.2, 4.3].
There are 10 elements in 1,640000 radius around 2,760000:
[1.12, 1.13, 1.15, 2.0, 3.4, 3.44, 4.1, 4.2, 4.3, 4.4].
I think there's nothing wrong with your code and I doubt that you will find a library that does something so specific. But if still you want an idea to approach this problem using a more OOP approach that reuses Java collections, here it comes another approach:
Create a class to represent numbers with different number of decimals. It would have something like VariableDecimal(double d,int ndecimals) as constructor.
In that class override the object methods equals and hashCode. Your implementation of equals will test if two instances of VariableDecimal are the same taking into account the value d and the number of decimals. hashCode can simple return d*exp(10,ndecimals) casted to Integer.
In your logic use HashMaps so that they reuse your object:
HashMap<VariableDecimal, AtomicInteger> counters = new HashMap<VariableDecimal, AtomicInteger>();
for (double d : a) {
VariableDecimal vd = new VariableDecimal(d,ndecimals);
if (counters.get(vd)!=null)
counters.set(vd,new AtomicInteger(0));
counters.get(vd).incrementAndGet();
}
/* at the end of this loop counters should hold a map with frequencies of
each double for the selected precision so that you can simply traverse and
get the max */
This piece of code doesn't show the iteration to decrement the number of decimals, which is trivial.

Categories