Probabilistically Selecting A Set Number of Items Over Time - java

I have an enterprise application running on a server that accepts files. Tens of thousands of files are submitted every day by users. The customer wants exactly 50 of these files to be automatically selected for audit each day.
The requirements are:
the files must be selected as they come in (we can't wait for all the files to come in and then choose 50 at the end of the day)
the files selected must meet some other criteria, which they haven't told me yet, but I am assured there will still be thousands of files that meet these criteria
the system must not be "game-able". That is - they don't want users who submit files to realise that if they wait until the afternoon or something, their files never get audited. This means we can't just choose the first 50 that come in, the selected files must be randomly spread out throughout the day.
we have to have EXACTLY 50 files. Or so they say. But I'm pretty sure if it just so happened that no user submitted a file that matched the criteria after midday one day, and we only got 25 files, they'd be ok with that. So I can assume that the types of files I'm interested in are submitted with a reasonably regular frequency throughout the day.
I figure then, that I need some function that calculates a probability that a file will be selected, that uses the number of currently chosen files and the time of day as inputs.
I've created a test harness. Please forgive the dodgy code. In this, the "pushTask" thread simulates files coming in by adding them to a stack. "Files" in this test are just Strings with a random number on the end.
The "pullTask" thread simulates files being pulled off the stack. It asks requirementsFunction() if the "file" meets the extra requirements needed (and in this test that's just - does it end in a zero), and it asks probabilityFunction() if it should select the file. If a file is selected, it is printed to System.out.
Really I need some help as to what to put in probabilityFunction(), because at the moment what's in there is garbage (I've left it in so you can see what I've tried). Or if someone knows of a mathematical probability function that uses items/time that would be great too.
package com.playground;
import java.time.Duration;
import java.time.Instant;
import java.util.ArrayDeque;
import java.util.Deque;
import java.util.Random;
public class ProbabilisticSelection {
private static int TOTAL_FILES = 1000;
private static int AUDIT_QUANTITY = 10;
private static int TIME_IN_SECONDS_FOR_ALL_FILES = 10;
private Random random = new Random();
private Deque<String> stack = new ArrayDeque<String>();
private boolean finished;
public static void main(String[] args) throws InterruptedException {
new ProbabilisticSelection().test();
}
private void test() throws InterruptedException {
Instant begin = Instant.now();
Runnable pushTask = () -> {
while (!finished) {
int next = random.nextInt(TOTAL_FILES);
String item = "File: " + next;
stack.push(item);
if (Duration.between(begin, Instant.now()).getSeconds() >= TIME_IN_SECONDS_FOR_ALL_FILES) {
finished = true;
}
try {
Thread.sleep(10);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
};
Runnable pullTask = () -> {
int itemNumber = 1;
while (itemNumber <= AUDIT_QUANTITY && !finished) {
String poll = stack.poll();
if (requirementsFunction(poll) &&
probabilityFunction(itemNumber, Duration.between(begin, Instant.now()))) {
System.out.println(itemNumber++ + ": "+ poll);
}
try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
finished = true;
Duration delta = Duration.between(begin, Instant.now());
System.out.println();
System.out.println("Retrieved files: " + (itemNumber - 1) + ", should be, " + AUDIT_QUANTITY);
System.out.println("Time taken: " + delta.getSeconds() + ", should be, " + TIME_IN_SECONDS_FOR_ALL_FILES);
};
new Thread(pullTask).start();
new Thread(pushTask).start();
}
private boolean requirementsFunction(String item) {
return item != null && item.endsWith("0");
}
private boolean probabilityFunction(int itemNumber, Duration delta) {
double limit = ((double)(AUDIT_QUANTITY-itemNumber)/(double)AUDIT_QUANTITY + 1); // probability goes down as number of items goes up
double tension = (double)TIME_IN_SECONDS_FOR_ALL_FILES/((double)delta.getSeconds() + 1); // probablity goes up as time nears the end
if (tension == 1) {
return true;
}
double prob = limit * tension * 100;
int rand = random.nextInt(1000);
return prob > rand;
}
}

Algorithm is called Reservoir_sampling, which guarantees fair sampling of k items from some large and unknown N. Hereis Java code

Related

Java. Object being repeatedly created using a timer. Is there a way to reduce one of the variables everytime the timer executes

I'm creating a battery object that's being created on a timer. I'm feeding the output of this into a database
Its basically just a simulator for a front end that monitors a battery device. And emits errors when the variables drop below a certain value
So far I'm creating the object every 10 seconds as follows.
public class Battery {
public String name;
public int input;
public int output;
public static int charge;
Random random = new Random();
public Battery(String dName) {
this.name = dName;
this.input = random.nextInt((415 - 400) + 1) + 400;
this.output = input + random.nextInt((3 - 0) + 1);
}
#Override
public String toString() {
return "Battery [name=" + name + ", input=" + input + ", output=" + output + ", charge=" + charge + "]";
}
public static void main(String[] args) {
Timer timer = new Timer();
timer.schedule(new TimerTask() {
#Override
public void run() {
Battery battery = new Battery("Battery 1");
System.out.println(battery.toString());
}
} , 0, 1000);// end of timer
}
I have a variable charge which will represent the battery charge. I've tried to make this decrease using a while loop.
battery.setCharge(100);
int charge = battery.getCharge();
System.out.println(charge);
while (charge>0) {
System.out.println(charge);//for testing
battery.setCharge(charge);
charge--;
}
But because of the nature of the timer, its just the same object being created every time. So in this case it just counted down from 100 then output the charge as 1.
Is there a way to decrease this every time the timer is called? I've tried using a static integer but that didn't help at all.
I also tried to put the while loop outside of the public void run() method and got a myriad of errors for doing that!
Still in the learning phase of Java, this possible needs to go down the threads route. but I'm not quite there yet.
Also I have purposely not added a means for the timer to end, Its just something I'm turning on and off in eclipse to watch numbers appear on my front end.
Apologies if adding a second question isn't allow, but is there a way to make
this.input = random.nextInt((415 - 400) + 1) + 400;
return a 0 every so often?

Random for-loop in Java?

I have 25 batch jobs that are executed constantly, that is, when number 25 is finished, 1 is immediately started.
These batch jobs are started using an URL that contains the value 1 to 25. Basically, I use a for loop from 1 to 25 where I, in each round, call en URL with the current value of i, http://batchjobserver/1, http://batchjobserver/2 and so on.
The problem is that some of these batch jobs are a bit unstable and sometimes crashes which causes the for-loop to restart at 1. As a consequence, batch job 1 is run every time the loop is initiated while 25 runs much less frequently.
I like my current solution because it is so simple (in pseudo code)
for (i=1; i < 26; i++) {
getURL ("http://batchjob/" + Integer.toString(i));
}
However, I would like I to be a random number between 1 and 25 so that, in case something crashes, all the batch jobs, in the long run, are run approximately the same number of times.
Is there some nice hack/algorithm that allows me to achieve this?
Other requirements:
The number 25 changes frequently
This is not an absolut requirement but it would be nice one batch job wasn't run again until all other all other jobs have been attempted once. This doesn't mean that they have to "wait" 25 loops before they can run again, instead - if job 8 is executed in the 25th loop (the last loop of the first "set" of loops), the 26th loop (the first loop in the second set of loops) can be 8 as well.
Randomness has another advantage: it is desirable if the execution of these jobs looks a bit manual.
To handle errors, you should use a try-catch statement. It should look something like this:
for(int i = 1, i<26, i++){
try{
getURL();
}
catch (Exception e){
System.out.print(e);
}
}
This is a very basic example of what can be done. This will, however, only skip the failed attempts, print the error, and continue to the next iteration of the loop.
There are two parts of your requirement:
Randomness: For this, you can use Random#nextInt.
Skip the problematic call and continue with the remaining ones: For this, you can use a try-catch block.
Code:
Random random = new Random();
for (i = 1; i < 26; i++) {
try {
getURL ("http://batchjob/" + Integer.toString(random.nextInt(25) + 1));
} catch (Exception e) {
System.out.println("Error: " + e.getMessage());
}
}
Note: random.nextInt(25) returns an int value from 0 to 24 and thus, when 1 is added to it, the range becomes 1 to 25.
You could use a set and start randomizing numbers in the range of your batches, while doing this you will be tracking which batch you already passed by adding them to the set, something like this:
int numberOfBatches = 26;
Set<Integer> set = new HashSet<>();
List<Integer> failedBatches = new ArrayList<>();
Random random = new Random();
while(set.size() <= numberOfBatches)
{
int ran = random.nextInt(numberOfBatches) + 1;
if(set.contains(ran)) continue;
set.add(ran);
try
{
getURL ("http://batchjob/" + Integer.toString(ran));
} catch (Exception e)
{
failedBatches.add(ran);
}
}
As an extra, you can save which batches failed
The following is an example of a single-threaded, infinite looping (also colled Round-robin) scheduler with simple retry capabilities. I called "scrape" the routine that calls your batch job (scraping means indexing a website contents):
public static void main(String... args) throws Exception {
Runnable[] jobs = new Runnable[]{
() -> scrape("https://www.stackoverfow.com"),
() -> scrape("https://www.github.com"),
() -> scrape("https://www.facebook.com"),
() -> scrape("https://www.twitter.com"),
() -> scrape("https://www.wikipedia.org"),
};
for (int i = 0; true; i++) {
int remainingAttempts = 3;
while (remainingAttempts > 0) {
try {
jobs[i % jobs.length].run();
break;
} catch (Throwable err) {
err.printStackTrace();
remainingAttempts--;
}
}
}
}
private static void scrape(String website) {
System.out.printf("Doing my job against %s%n", website);
try {
Thread.sleep(100); // Simulate network work
} catch (InterruptedException e) {
throw new RuntimeException("Requested interruption");
}
if (Math.random() > 0.5) { // Simulate network failure
throw new RuntimeException("Ooops! I'm a random error");
}
}
You may want to add multi-thread capabilities (that is achieved by simply adding an ExecutorService guarded by a Semaphore) and some retry logic (for example only for certain type of errors and with a exponential backoff).

Eclipse console does not show the end result

public class PrimeFinder implements Runnable {
Thread go;
StringBuffer primes = new StringBuffer();
int time = 0;
public PrimeFinder() {
start();
while (primes != null) {
System.out.println(time);
try {
Thread.sleep(1000);
} catch(InterruptedException exc) {
// do nothing
}
time++;
}
}
public void start() {
if (go == null) {
go = new Thread(this);
go.start();
}
}
public void run() {
int quantity = 1_000_000;
int numPrimes = 0;
// candidate: the number that might be prime
int candidate = 2;
primes.append("\nFirst ").append(quantity).append(" primes:\n\n");
while (numPrimes < quantity) {
if (isPrime(candidate)) {
primes.append(candidate).append(" ");
numPrimes++;
}
candidate++;
}
System.out.println(primes);
primes = null;
System.out.println("\nTime elapsed: " + time + " seconds");
}
public static boolean isPrime(int checkNumber) {
double root = Math.sqrt(checkNumber);
for (int i = 2; i <= root; i++) {
if (checkNumber % i == 0) {
return false;
}
}
return true;
}
public static void main(String[] arguments) {
new PrimeFinder();
}
}
As the title states I do not get the end result of the program in the console... The timer does count, until the calculation is done, but when the program is supposed to print out the result, the console clears and goes all blank. I tried to enter my code in some kind of online compiler, and sure enough, I got the end result printed out. Has anyone had similar problem and if so, how did you manage to fix it? Thanks in advance!
I tried this in IntelliJ and the output is retained in the console - though it's not all retained when using Eclipse as you point out. Looks like Eclipse maybe overwrites the console upon switching threads.
You could just write the output to a file and view it there, although this is of course different from retaining it in the console. To do that you would go to Run > Run Configurations..., then select your application under Java Application, then click the Common tab, then under Standard Input and Output check the Output file checkbox and enter a log file path and file name for where you want logs to go, then Apply. Then run that "Run Configuration" and view the output in the file.
If running on a server (e.g., tomcat) from within Eclipse, you would go to the Server view, double-click your server, click Open launch configuration, then Common, then Output file. From there you can specify an output destination on your file system or in your Eclipse workspace.
If you started the Thread instead of just doing the instantiation, maybe it would print something out.
Thread thread = new Thread(new PrimeFinder());
thread.start();

Program never ends but Thread Pool claims to shut down

I got an issue with a multi-threaded program.
I need to simulate a lot of people trying to book the same flight at the same time, using no locks.
So I made an ExecutorService and used that to pool threads so I can have many simultaneous attempt going at once.
The problem is though, the program sort of reaches the end and right before it prints out all the results, it just sits there and runs forever. I've tried to go into all the other classes that utilize a connection to a database and simply close them off manually. No luck.
package dbassignment4;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Future;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.RejectedExecutionException;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
/**
*
* #author Vipar
*/
public class Master{
private static final int POOL_SIZE = 50;
public static boolean planeIsBooked = false;
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
int success = 0;
int seatNotReserved = 0;
int seatNotReservedByCustomerID = 0;
int reservationTimeout = 0;
int seatIsOccupied = 0;
int misc = 0;
HelperClass.clearAllBookings("CR9");
Thread t = new Thread(new DBMonitor("CR9"));
long start = System.nanoTime();
//HelperClass.clearAllBookings("CR9");
ExecutorService pool = new ThreadPoolExecutor(
POOL_SIZE, POOL_SIZE,
0L,
TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<Runnable>(POOL_SIZE));
int threadsStarted = 0;
List<Future<Integer>> results = new ArrayList<>();
long id = 1;
t.start();
while(planeIsBooked == false) {
try {
Future<Integer> submit = pool.submit(new UserThread(id));
results.add(submit);
} catch (RejectedExecutionException ex) {
misc++;
continue;
}
threadsStarted++;
id++;
}
pool.shutdownNow();
int count = 0;
for(Future<Integer> i : results) {
try {
switch(i.get()) {
case 0:
// Success
success++;
break;
case -1:
// Seat is not Reserved
seatNotReserved++;
break;
case -2:
// Seat is not Reserved by Customer ID
seatNotReservedByCustomerID++;
break;
case -3:
// Reservation Timeout
reservationTimeout++;
break;
case -4:
// Seat is occupied
seatIsOccupied++;
break;
default:
misc++;
// Everything else fails
break;
}
} catch (ExecutionException | InterruptedException ex) {
misc++;
}
count++;
System.out.println("Processed Future Objects: " + count);
// HERE IS WHERE IT LOOPS
}
Here is the rest of the code it doesn't execute right after:
long end = System.nanoTime();
long time = end - start;
System.out.println("Threads Started: " + threadsStarted);
System.out.println("Successful Bookings: " + success);
System.out.println("Seat Not Reserved when trying to book: " + seatNotReserved);
System.out.println("Reservations that Timed out: " + reservationTimeout);
System.out.println("Reserved by another ID when trying to book: " + seatNotReservedByCustomerID);
System.out.println("Bookings that were occupied: " + seatIsOccupied);
System.out.println("Misc Errors: " + misc);
System.out.println("Execution Time (Seconds): " + (double) (time / 1000000000));
}
}
Can you spot the issue? I put in a comment where the code stops running.
When planeIsBooked become true, looks like your planeIsBooked will never be initialize to true in the while loop. So make sure your loop is not infinite.
first thing is while(planeIsBooked == false) evaluates to true always because
planeIsBooked = false always , nowhere its initialized to true.
so how come your while loop condition becomes false and come out?
set inside while loop somewhere planeIsBooked = true to come out of while loop.
Couple of Things :
First , After you call pool.shutdownNow(); - you are proceeding to try fetching the results straight away. The call shutDownNow() is not blocking and is not a definite indication that pool is stopped. For that - you should call pool.awaitTermination().
Second, It is not clear what you mean by your comment -
// HERE IS WHERE IT LOOPS
This is under a loop - and looking at the loop - if there is an exception thrown in the switch case - then it will go into the catch - ignore it and loop. Did you check for exceptions ?
Go through this answer for why you should declare static variables as volatile in multi thread environment.
Even though , your static variable is volatile , the following lines is dangerous
while(planeIsBooked == false) {
Future<Integer> submit = pool.submit(new UserThread(id));
}
Consider your booking flight taking avg 2 secs . And you have about 300 seats ( assumption). Your planeIsBooked field would become true after 600 seconds ( if its running in single thread). With 50 size pool it would run in 12 seconds.
With the above assumption , your loop would run 12 seconds.
Now , think about how many time the submit request statement executed ? i times.
Even though you have just 300 seats , you might given appox more minimum million requests with in 12 seconds.
Thus, think about the number Jobs in queue before calling Shutdown now() . THis is not the right way of terminating loop
If you know the max size of your flight seat , why dont you use it in for loop ( may be externalize the parameter in for loop , instead of variable to hold

Parallelize calculations

I need to calculate the mean and extract the root of some numbers from a huge file:
1, 2, 3, 4, 5,\n
6, 7, 8, 9, 10,\n
11, 12, 13, 14,15,\n
...
This is the code:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
public class App1{
int res, c;
double mean, root;
ArrayList list = new ArrayList();
public App1() {
// einlesen
Scanner sc = null;
try {
sc = new Scanner(new File("file.txt")).useDelimiter("[,\\s]+");
} catch (FileNotFoundException ex) {
System.err.println(ex);
}
while (sc.hasNextInt()) {
list.add(sc.nextInt());
res += (int) list.get(c);
c++;
}
sc.close();
// Mean
mean = res / list.size();
// Root
root = Math.sqrt(mean);
System.out.println("Mean: " + mean);
System.out.println("Root: " + root);
}
public static void main(String[] args) {
App1 app = new App1();
}
}
Is there any way to parallelize it?
Before calculating the mean I need all the numbers, so one thread can't calculate while another is still fetching the numbers from the file.
The same thing with extracting the root: A thread can't extract it from the mean if the mean isn't calculated yet.
I thought about Future, would that be a solution?
There's something critical you will have to accept up front - you will not be able to process the data any faster than you can read it from the file. So first time how long it will take to read through the whole file and accept that you won't improve on that.
That said - have you considered a ForkJoinPool.
You can calculate the mean in parallel, because the mean is simply the sum divided by the count. There is no reason why you cannot sum up the values in parallel, and count them as well, then just do the division later.
Consider a class:
public class PartialSum() {
private final int partialcount;
private final int partialsum;
public PartialSum(int count, int sum) {
partialsum = sum;
partialcount = count;
public int getCount() {
return partialcount;
}
public int getSum() {
return partialsum;
}
}
Now, this could be the return type of a Future, as in Future<PartialSum>.
So, what you need to do is split the file in parts, and then send the parts to individual threads.
Each thread calculates a PartialSum. Then, as the threads complete, you can:
int sum = 0;
int count = 0;
for(Future<PartialSum> partial : futures) {
PartialSum ps = partial.get();
sum += ps.getSum();
count += ps.getCount();
}
double mean = (double)sum / count;
double root = ....
I think it's possible.
int offset = (filesize / Number of Threads)
Create n threads
Each thread starts reading from offset * thread number. Eg Thread 0 starts reading from byte 0, thread 1 starts reading from offset * 1, thread 2 starts reading from offset * 2
If thread num != 0, read ahead until you hit a newline character - start from there.
Add up an average per thread. Save to "thread_average" or something.
When all threads are finished, total average = average of all "thread_average" variables
Square root the total average variable.
It will need a bit of messing around to make sure the threads don't read too far into another threads block of the file, but should be do-able
No there is no way to parallelize this. Although you could do something that looks like you are using threading, the result will be overly complex but still run at about the same speed as before.
The reason for this is that file access is and has to be single-threaded, and beside reading from file all you do is two add operations. So in best case those add operations could be parallelized, however since those take almost no execution time, the gain would be like 5% - 10% at best. And that time is negated (or worse) by the thread creation and maintenance.
Once thing you can do to speed things up would be to remove the part where you put things into a list (assuming that you don't need those values later).
while (sc.hasNextInt()) {
res += sc.nextInt();
++c;
}
mean = res / c;

Categories