Sampling function produces same result everytime

Sampling function produces same result everytime - java

I am generating weighted random numbers(sampling with replacement) through the following code
Object[] population = { 0, 1 };
double[] weights = { p1, p2 };
Sampling randsamp = new Sampling(population, weights);
X = (Integer) randsamp.next();
I have tried different values of p1 and p2 which are the probabilities and 0 and 1 are the population(numbers which are to be generated based on p1 and p2).
However, running the code multiple times produces the same result, for example if I make 10 iterations and store the result in an array X[] I get the same array every time the code is executes. Can someone tell me why is this happening? Should I not get different array/numbers at each iteration?
Thanks

If you search in google jpsgcs.alun.random.Sampling, you get some broken links about this Sampling class. Moreover, if you browse here you can see that in the jar, that you can download, there is no more even such a package like random. So, probably that was removed for some reasons ... Maybe this Sampling class was removed because not working properly? I can just suggest you to get in touch with somebody that wrote this library.

Related

SlidingWindows for slow data (big intervals) on Apache Beam

I am working with Chicago Traffic Tracker dataset, where new data is published every 15 minutes. When new data is available, it represents records off by 10-15 minutes from the "real time" (example, look for _last_updt).
For example, at 00:20, I get data timestamped 00:10; at 00:35, I get from 00:20; at 00:50, I get from 00:40. So the interval that I can get new data "fixed" (every 15 minutes), although the interval on timestamps change slightly.
I am trying to consume this data on Dataflow (Apache Beam) and for that I am playing with Sliding Windows. My idea is to collect and work on 4 consecutive datapoints (4 x 15min = 60min), and ideally update my calculation of sum/averages as soon as a new datapoint is available. For that, I've started with the code:
PCollection<TrafficData> trafficData = input
.apply("MapIntoSlidingWindows", Window.<TrafficData>into(
SlidingWindows.of(Duration.standardMinutes(60)) // (4x15)
.every(Duration.standardMinutes(15))) . // interval to get new data
.triggering(AfterWatermark
.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane()))
.withAllowedLateness(Duration.ZERO)
.accumulatingFiredPanes());
Unfortunately, looks like when I receive a new datapoint from my input, I do not get a new (updated) result from the GroupByKey that I have after.
Is this something wrong with my SlidingWindows? Or am I missing something else?

One issue may be that the watermark is going past the end of the window, and dropping all later elements. You may try giving a few minutes after the watermark passes:
PCollection<TrafficData> trafficData = input
.apply("MapIntoSlidingWindows", Window.<TrafficData>into(
SlidingWindows.of(Duration.standardMinutes(60)) // (4x15)
.every(Duration.standardMinutes(15))) . // interval to get new data
.triggering(AfterWatermark
.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane())
.withLateFirings(AfterProcessingTime.pastFirstElementInPane()))
.withAllowedLateness(Duration.standardMinutes(15))
.accumulatingFiredPanes());
Let me know if this helps at all.

So #Pablo (from my understanding) gave the correct answer. But I had some suggestions that would not fit in a comment.
I wanted to ask whether you need sliding windows? From what I can tell, fixed windows would do the job for you and be computationally simpler as well. Since you are using accumulating fired panes, you don't need to use a sliding window since your next DoFn function will already be doing an average from the accumulated panes.
As for the code, I made changes to the early and late firing logic. I also suggest increasing the windowing size. Since you know the data comes every 15 minutes, you should be closing the window after 15 minutes rather than on 15 minutes. But you also don't want to pick a window which will eventually collide with multiples of 15 (like 20) because at 60 minutes you'll have the same problem. So pick a number that is co-prime to 15, for example 19. Also allow for late entries.
PCollection<TrafficData> trafficData = input
.apply("MapIntoFixedWindows", Window.<TrafficData>into(
FixedWindows.of(Duration.standardMinutes(19))
.triggering(AfterWatermark.pastEndOfWindow()
// fire the moment you see an element
.withEarlyFirings(AfterPane.elementCountAtLeast(1))
//this line is optional since you already have a past end of window and a early firing. But just in case
.withLateFirings(AfterProcessingTime.pastFirstElementInPane()))
.withAllowedLateness(Duration.standardMinutes(60))
.accumulatingFiredPanes());
Let me know if that solves your issue!
EDIT
So, I could not understand how you computed the above example, so I am using a generic example. Below is a generic averaging function:
public class AverageFn extends CombineFn<Integer, AverageFn.Accum, Double> {
public static class Accum {
int sum = 0;
int count = 0;
}
#Override
public Accum createAccumulator() { return new Accum(); }
#Override
public Accum addInput(Accum accum, Integer input) {
accum.sum += input;
accum.count++;
return accum;
}
#Override
public Accum mergeAccumulators(Iterable<Accum> accums) {
Accum merged = createAccumulator();
for (Accum accum : accums) {
merged.sum += accum.sum;
merged.count += accum.count;
}
return merged;
}
#Override
public Double extractOutput(Accum accum) {
return ((double) accum.sum) / accum.count;
}
}
In order to run it you would add the line:
PCollection<Double> average = trafficData.apply(Combine.globally(new AverageFn()));
Since you are currently using accumulating firing triggers, this would be the simplest coding way to solve the solution.
HOWEVER, if you want to use a discarding fire pane window, you would need to use a PCollectionView to store the previous average and pass it as a side input to the next one in order to keep track of the values. This is a little more complex in coding but would definitely improve performance since constant work is done every window, unlike in accumulating firing.
Does this make enough sense for you to generate your own function for discarding fire pane window?

Neural networks and Python

I'm trying to study about neural networks, following a great guide:
http://neuralnetworksanddeeplearning.com/chap1.html
Currently I've reached this code snippet which I'm trying to understand and write in Java:
class Network(object):
def __init__(self, sizes):
self.num_layers = len(sizes)
self.sizes = sizes
self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
self.weights = [np.random.randn(y, x)
for x, y in zip(sizes[:-1], sizes[1:])]
I managed to figure out what everything means except for the last line:
[np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]
As far as I can understand: create a matrix with y rows and x columns, for each pair x,y which can be found in the matrix zip which is created by the merging of the two "sizes" arrays. I understand that sizes[1:] means taking all elements from sizes starting from index 1, but sizes[:-1] makes no sense to me.
I read online that s[::-1] means getting the reverse of the array, but in the above case we only have one colon, while in the formula for the reverse array there seems to be two colons.
Sadly, I have no idea how Python works and I got pretty far along with the online book to give it up now (I also truly like it), so can someone say if I'm right until now, correct me if needed, or straight out explaining that final line?

sizes[:-1] is a list slice which returns a copy of the sizes list but without the last item.

Shuffle an array so every value will have a different index [duplicate]

This question already has answers here:
How to test randomness (case in point - Shuffling)
(11 answers)
Closed 8 years ago.
I have written a method to shuffle a String array
So the task is to implement WhiteElephant concept for a given string array of list of names.Should generate assignments to match the original elements.
I have written method to pick a random number and used a map to store the values so that each value will have a different index. But this prints out only 5 values. and i am confused now.
public static String[] generateAssignments(final String[] participants) {
Random r = new Random();
int size = participants.length;
HashMap val = new HashMap();
int change = 0;
String[] assignments = new String[6];
System.out.println("Using arrays");
for(int i=0; i<size;i++){
for(int j =0; j<size; j++){
change = r.nextInt(size);
if(val.containsValue(change) || change==i){
continue;
}
else val.put(i, change);
assignments[i] = participants[change];
System.out.println(assignments[i]);
break;
}
}
return assignments;
}
I appreciate your inputs.
Thanks,
Lucky

If your shuffle method is random (or pseudorandom) it will be near impossible to unit test since the output is non deterministic. If you allow for seeding a random number generator then you could ensure the output is consistent given the same seeds, though this doesn't show randomness.
You could also run the shuffle method a large number of times and check that each card shows up at each index an approixmately equal number of times. Over a large enough number of simulations this should help illustrate randomness.

FYI - There are some logical errors in both your shuffle() code and the test. I won't address those here; hopefully having a good test will allow you to figure out the problems!
Writing tests around Random data is hard.
The best option is to pass in an instance of Random to your shuffle() method or it's containing class. Then in test usages, you can pass in an instance of Random which has been seeded with a known value. Given that the Random code will behave the same every time and you control the input array, your test will be deterministic; you can confidently assert on each object in the sorted collection.
The only downside of this approach is that you won't have a test preventing you from re-writing your shuffle() method to simply re-order the elements every time into this specified order. But that might be over-thinking it; usually we can trust our future selves.
An alternative approach is to assume that in a Random world, given enough time, every data possibility will be realized.
I used this approach when testing a 6-sided die's roll() method. I needed to ensure that getting all values from 1-6 was possible. I didn't want to complicate the method signature or the Die constructor to take in an instance of Random. I also didn't feel confident in a test that used a known seed and simply always asserted 3 (i.e.).
Instead, I made the assumption that given enough rolls, all values from 1-6 would eventually be rolled. I wrote a test that infinitely called roll until all values from 1-6 had been returned. Then I added a timeout to the test so that it would fail after 1 second if the above condition hadn't been met.
#Test(timeout = 1000)
public void roll_shouldEventuallyReturnAllNumbersBetweenOneAndSixInclusively() {
Die die = new Die();
Set<Integer> rolledValues = new HashSet<Integer>();
int totalOfUniqueRolls = 0;
while (rolledValues.size() < Die.NUM_SIDES) {
if (rolledValues.add(die.roll())) {
totalOfUniqueRolls += die.getFaceValue();
}
}
assertEquals(summation(1, Die.NUM_SIDES), totalOfUniqueRolls);
}
Worst case scenario it fails after 1 second (which hasn't happened yet) but it usually passes in about 20 milliseconds.

The test must be reproducible: it is not useful if it depends on something random.
I suggest you to use mocking so the CUT (code under test) don't use the real Random class instantiated in production, but a different class written by you with a predictable behavior, giving you the possibility to make some significant assertions on two or three items.

It appears your shuffle() method will always return the same result. So given a input test array of however many elements in your test, just specify the exact output array you'd expect.
It looks like you are trying to write a very general test. Instead, your tests should be very specific: given a specific input A then you expect a specific output B.

The x-values in a line chart of JavaFX are not ordered properly / messed up?

I am working on some Graphs using JavaFX, and I use a few line charts to get my result. There is however a problem. I think that the following code will explain what I'm trying to do:
while(startTime.getTimeInMillis() <= endTime.getTimeInMillis()) {
int index = getIndex(startTime);
if(index != -1) {
N k = dataList.get(index);
Data<String, N> point = new Data<String, N>(distanceList.get(index) + "", k);
series.getData().add(point);
}
int newMinutes = startTime.get(Calendar.MINUTE)+5;
startTime.set(Calendar.MINUTE, newMinutes);
}
This code fills a series (JavaFX see here) and other code sets it in a chart which is not the problem. The above code retrieves the index by using the time. The indexes retrieved are all in ascending order, I already checked that. The distanceList is a list in ascending order and the dataList is not.
So all the datapoints in the series are added with the x values in an ascending order, but whenever this is set, I get a graph where all the X labels are messed up (and therefore also the line itself). The x labels don't have equal spacing and sometimes are over each other. I searched the internet and found some similar problems of others, which some of them said it was a bug in JavaFX. I am running the newest version of Java and JavaFX, respectively 1.8 and 8.
Everything is working fine, all values are correct as I checked it multiple times. It's just the labels on x-axis that are not shown properly. I have no clue in what the problem is, do some of you know?

How do I remove objects from an Array in java based on certain condition?

:)
I have
int[] code = new int[10];
It has the following values:
code[0] = 1234;
code[1] = 2222;
code[2] = 2121;
code[3] = 4321;
code[4] = 3333;
code[5] = 2356;
The code in this case refers to the serial number of the files.
The user is suppose to enter the code of the file to remove that specific file.
Let's say user enter 3333 as the code to remove.
code[4] = 3333 would be removed and code[5] = 2356 will move up to take its place. See below...
code[0] = 1234;
code[1] = 2222;
code[2] = 2121;
code[3] = 4321;
code[4] = 2356;
How would I tackle this problem?
I read up that using an Array List would make my life much easier.
However, I was told to just use an array.
Any help please? :)

How would I tackle this problem?
Allocate a new array and copy all of the values that you want to keep from the existing array to the new one.
You could also update the array in place, filling the "hole" at the end of the array with some special value that can't be a legal code.
Since this is a "sounds like homework" question, I'll leave you to figure out how to code it. (It is pretty simple. Just a loop, a test, and some careful manipulation of a second index.)

That's simply not possible. Arrays have a fixed size, so you'll have to make a second array with its size reduced by 1 and copy all values except the one you want to keep. Or keep track of the "working size" of the array separately, at which point you're begun to reimplement ArrayList.

You can simply set the value to something which is improbable (like -1 to save yourself the trouble of moving around and adjust arrays using System.arrayCopy or the likes. This of course assumes that the aim is to get the functionality working. If "moving" elements is an absolute requirement, you'd have to create a new array as mentioned by another comment here.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Sampling function produces same result everytime - java

Related

SlidingWindows for slow data (big intervals) on Apache Beam

Neural networks and Python

Shuffle an array so every value will have a different index [duplicate]

The x-values in a line chart of JavaFX are not ordered properly / messed up?

How do I remove objects from an Array in java based on certain condition?

Categories

Resources