Parallelize calculations

Parallelize calculations - java

I need to calculate the mean and extract the root of some numbers from a huge file:
1, 2, 3, 4, 5,\n
6, 7, 8, 9, 10,\n
11, 12, 13, 14,15,\n
...
This is the code:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
public class App1{
int res, c;
double mean, root;
ArrayList list = new ArrayList();
public App1() {
// einlesen
Scanner sc = null;
try {
sc = new Scanner(new File("file.txt")).useDelimiter("[,\\s]+");
} catch (FileNotFoundException ex) {
System.err.println(ex);
}
while (sc.hasNextInt()) {
list.add(sc.nextInt());
res += (int) list.get(c);
c++;
}
sc.close();
// Mean
mean = res / list.size();
// Root
root = Math.sqrt(mean);
System.out.println("Mean: " + mean);
System.out.println("Root: " + root);
}
public static void main(String[] args) {
App1 app = new App1();
}
}
Is there any way to parallelize it?
Before calculating the mean I need all the numbers, so one thread can't calculate while another is still fetching the numbers from the file.
The same thing with extracting the root: A thread can't extract it from the mean if the mean isn't calculated yet.
I thought about Future, would that be a solution?

There's something critical you will have to accept up front - you will not be able to process the data any faster than you can read it from the file. So first time how long it will take to read through the whole file and accept that you won't improve on that.
That said - have you considered a ForkJoinPool.

You can calculate the mean in parallel, because the mean is simply the sum divided by the count. There is no reason why you cannot sum up the values in parallel, and count them as well, then just do the division later.
Consider a class:
public class PartialSum() {
private final int partialcount;
private final int partialsum;
public PartialSum(int count, int sum) {
partialsum = sum;
partialcount = count;
public int getCount() {
return partialcount;
}
public int getSum() {
return partialsum;
}
}
Now, this could be the return type of a Future, as in Future<PartialSum>.
So, what you need to do is split the file in parts, and then send the parts to individual threads.
Each thread calculates a PartialSum. Then, as the threads complete, you can:
int sum = 0;
int count = 0;
for(Future<PartialSum> partial : futures) {
PartialSum ps = partial.get();
sum += ps.getSum();
count += ps.getCount();
}
double mean = (double)sum / count;
double root = ....

I think it's possible.
int offset = (filesize / Number of Threads)
Create n threads
Each thread starts reading from offset * thread number. Eg Thread 0 starts reading from byte 0, thread 1 starts reading from offset * 1, thread 2 starts reading from offset * 2
If thread num != 0, read ahead until you hit a newline character - start from there.
Add up an average per thread. Save to "thread_average" or something.
When all threads are finished, total average = average of all "thread_average" variables
Square root the total average variable.
It will need a bit of messing around to make sure the threads don't read too far into another threads block of the file, but should be do-able

No there is no way to parallelize this. Although you could do something that looks like you are using threading, the result will be overly complex but still run at about the same speed as before.
The reason for this is that file access is and has to be single-threaded, and beside reading from file all you do is two add operations. So in best case those add operations could be parallelized, however since those take almost no execution time, the gain would be like 5% - 10% at best. And that time is negated (or worse) by the thread creation and maintenance.
Once thing you can do to speed things up would be to remove the part where you put things into a list (assuming that you don't need those values later).
while (sc.hasNextInt()) {
res += sc.nextInt();
++c;
}
mean = res / c;

Related

How to calculate number or reads operations per second my program does in java?

I need to write a program that writes a block of 100 Byte length with every write.
Every 1000 blocks written, status message should appear as well.
I think I did that successfully, but my issue is with the calculation of number of read operations per second.
I need to calculate how many of them my program does.
I am allowed to use System.nanoTime()
Here is what I wrote so far:
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
public class EAWrite {
public static void main(String[] args) throws IOException {
int recordsWritten = 0;
File myFile = new File("D:\\filename.txt");
FileOutputStream myInputFile = new FileOutputStream(myFile);
for (int i = 0; i < 10000; i++) {
myInputFile.write(100);
recordsWritten++;
if ((i % 100) == 0) {
System.out.println((recordsWritten - 1) + "records written.");
}
}
myInputFile.close();
}
}

Depending on how you define
number of read operations per second
you need to keep track of different values. You further specified that you only need the write operations per second over the entire execution time. Therefore, you only need to calculate the timespan in seconds your program ran, and divide the total write operations by that value:
double duration = (double) (endSystemNanos - startSystemNanos) / 1_000_000L;
double writeOpsPerSecond = (double) totalWriteOps / duration;

Mulithreading Usage

I am iterating through a HashMap with +- 20 Million entries. In each iteration I am again iterating through HashMap with +- 20 Million entries.
HashMap<String, BitSet> data_1 = new HashMap<String, BitSet>
HashMap<String, BitSet> data_2 = new HashMap<String, BitSet>
I am dividng data_1 into chunks based on number of threads(threads = cores, i have four core processor).
My code is taking more than 20 Hrs to excute. Excluding not storing the results into a file.
1) If i want to store the results of each thread without overlapping into a file, How can i
do that?.
2) How can i make the following much faster.
3) How to create the chunks dynamically, based on number of cores?
int cores = Runtime.getRuntime().availableProcessors();
int threads = cores;
//Number of threads
int Chunks = data_1.size() / threads;
//I don't trust with chunks created by the below line, that's why i created chunk1, chunk2, chunk3, chunk4 seperately and validated them.
Map<Integer, BitSet>[] Chunk= (Map<Integer, BitSet>[]) new HashMap<?,?>[threads];
4) How to create threads using for loops? Is it correct what i am doing?
ClassName thread1 = new ClassName(data2, chunk1);
ClassName thread2 = new ClassName(data2, chunk2);
ClassName thread3 = new ClassName(data2, chunk3);
ClassName thread4 = new ClassName(data2, chunk4);
thread1.start();
thread2.start();
thread3.start();
thread4.start();
thread1.join();
thread2.join();
thread3.join();
thread4.join();
Representation of My Code
Public class ClassName {
Integer nSimilarEntities = 30;
public void run() {
for (String kNonRepeater : data_1.keySet()) {
// Extract the feature vector
BitSet vFeaturesNonRepeater = data_1.get(kNonRepeater);
// Calculate the sum of 1s (L2 norm is the sqrt of this)
double nNormNonRepeater = Math.sqrt(vFeaturesNonRepeater.cardinality());
// Loop through the repeater set
double nMinSimilarity = 100;
int nMinSimIndex = 0;
// Maintain the list of top similar repeaters and the similarity values
long dpind = 0;
ArrayList<String> vSimilarKeys = new ArrayList<String>();
ArrayList<Double> vSimilarValues = new ArrayList<Double>();
for (String kRepeater : data_2.keySet()) {
// Status output at regular intervals
dpind++;
if (Math.floorMod(dpind, pct) == 0) {
System.out.println(dpind + " dot products (" + Math.round(dpind / pct) + "%) out of "
+ nNumSimilaritiesToCompute + " completed!");
}
// Calculate the norm of repeater, and the dot product
BitSet vFeaturesRepeater = data_2.get(kRepeater);
double nNormRepeater = Math.sqrt(vFeaturesRepeater.cardinality());
BitSet vTemp = (BitSet) vFeaturesNonRepeater.clone();
vTemp.and(vFeaturesRepeater);
double nCosineDistance = vTemp.cardinality() / (nNormNonRepeater * nNormRepeater);
// queue.add(new MyClass(kRepeater,kNonRepeater,nCosineDistance));
// if(queue.size() > YOUR_LIMIT)
// queue.remove();
// Don't bother if the similarity is 0, obviously
if ((vSimilarKeys.size() < nSimilarEntities) && (nCosineDistance > 0)) {
vSimilarKeys.add(kRepeater);
vSimilarValues.add(nCosineDistance);
nMinSimilarity = vSimilarValues.get(0);
nMinSimIndex = 0;
for (int j = 0; j < vSimilarValues.size(); j++) {
if (vSimilarValues.get(j) < nMinSimilarity) {
nMinSimilarity = vSimilarValues.get(j);
nMinSimIndex = j;
}
}
} else { // If there are more, keep only the best
// If this is better than the smallest distance, then remove the smallest
if (nCosineDistance > nMinSimilarity) {
// Remove the lowest similarity value
vSimilarKeys.remove(nMinSimIndex);
vSimilarValues.remove(nMinSimIndex);
// Add this one
vSimilarKeys.add(kRepeater);
vSimilarValues.add(nCosineDistance);
// Refresh the index of lowest similarity value
nMinSimilarity = vSimilarValues.get(0);
nMinSimIndex = 0;
for (int j = 0; j < vSimilarValues.size(); j++) {
if (vSimilarValues.get(j) < nMinSimilarity) {
nMinSimilarity = vSimilarValues.get(j);
nMinSimIndex = j;
}
}
}
} // End loop for maintaining list of similar entries
}// End iteration through repeaters
for (int i = 0; i < vSimilarValues.size(); i++) {
System.out.println(Thread.currentThread().getName() + kNonRepeater + "|" + vSimilarKeys.get(i) + "|" + vSimilarValues.get(i));
}
}
}
}
Finally, If not Multithreading, is there any other approaches in java, to reduce time complexity.

The computer works similarly to what you have to do by hand (It processes more digits/bits at a time but the problem is the same.
If you do addition, the time is proportional to the of the size of the number.
If you do multiplication or divisor it's proportional to the square of the size of the number.
For the computer the size is based on multiples of 32 or 64 significant bits depending on the implementation.

I'd say this task is suitable for parallel streams. Don't hesitate to take a look at this conception if you have time. Parallel streams seamlessly use multithreading at full speed.
The top-level processing will look like this:
data_1.entrySet()
.parallelStream()
.flatmap(nonRepeaterEntry -> processOne(nonRepeaterEntry.getKey(), nonRepeaterEntry.getValue(), data2))
.forEach(System.out::println);
You should provide processOne function with prototype like this:
Stream<String> processOne(String nonRepeaterKey, String nonRepeaterBitSet, Map<String BitSet> data2);
It will return prepared string list with what you print now into file.
To make stream inside you can prepare List list first and then turn it into stream in return statement:
return list.stream();
Even though inner loop can be processed in streams, parallel streaming inside is discouraged - you already have enough parallelism.
For your questions:
1) If i want to store the results of each thread without overlapping into a file, How can i do that?.
Any logging framework (logback, log4j) can deal with it. Parallel streams can deal with it. Also you can store prepared lines into some queue/array and print them in separate thread. It takes a bit of care, though, ready solutions are easier and effectively they do such thing.
2) How can i make the following much faster.
Optimize and parallelize. At normal situation you get number_of_threads/1.5..number_of_threads times faster processing thinking you have hyperthreading in play, but it depends on things you do not-so-parallel and underlying implementations of stuff.
3) How to create the chunks dynamically, based on number of cores?
You don't have to. Make a list of tasks (1 task per data_1 entry) and feed executor service with them - that's already big enough task size. You can use FixedThreadPool with number of threads as parameter, and it will deal will distribute tasks evenly.
Not you should create task class, get Future for each task upon threadpool.submit and in the end run a loop doing .get for each Future. It will throttle main thread down to executor processing speed implicitly doing fork-join like behaviour.
4) Direct threads creation is outdated technique. It's recommended to use executor service of some sort, parallel streams etc. For loop processing you need to create list of chunks, and in loop create thread, add it to list of threads. And in another loop join to each thread if the list.
Ad hoc optimizations:
1) Make Repeater class that will store key, bitset and cardinality. Preprocess your hashsets turning them into Repeater instances and calculating cardinality once (i.e. not for every inner loop run). It will save you 20mil*(20mil-1) calls of .cardinality(). You still need to call it for difference.
2) Replace similarKeys, similarValues with limited size priorityQueue on combined entries. It works faster for 30 elements.
Take a look at this question for infor about PriorityQueue:
Java PriorityQueue with fixed size
3) You can skip processing of nonRepeater if its cardinality is already 0 - bitSet and will never increase resulting cardinality, and you'll filter out all 0-distance values.
4) You can skip (remove from temporary list you create in p.1 optimization) every Repeater with zero cardinality. Like in p.3 it will never produce anything fruitful.

Using system.currentimemillis to measure how long it takes to calculate the output

I have a java project to do and the requirements are:
create a loop that repeats the following experiment 10 times. After all 10 iterations are complete, print the average time for one iteration of the experiment:
Select a random number r from 0 to n.
Using the System class, note the start time of the experiment
Repeatedly multiply two 9-digit values in a loop for r iterations. You need not preserve
the result of this multiplication.
note the end time of the experiment
This is what I have so far:
package lee_lab02;
import java.util.Random;
import java.util.Scanner;
public class Benchmark {
public Benchmark() {
}
public static void main(String[] args) {
System.out.println("Please enter a value for n:");
Scanner scan = new Scanner(System.in);
int n = scan.nextInt();
long start = System.currentTimeMillis();
for(int i=1; i<=10; i++)
{
Random rand = new Random();
double r = rand.nextInt(n);
for(int z=1;z<r;z++)
{
long num1 = 145893123;
long num2 = 901234278;
long num3 = num1 * num2;
}
}
long end = System.currentTimeMillis();
long totTime = end - start;
long avg = totTime/10;
System.out.println(avg);
}
}
The output of this prompts me with "Please enter a value for n:" but the time is not recorded. What am I doing wrong here?

There's nothing wrong with your program. It is just that you are performing trivial operations in your for loop that don't take much time at all. All the iterations are completed in less than 1 millisecond. Just add Thread.sleep(500); in your for loop and you will see the average getting printed as you expect it.
Also, use double for avg. (double avg = totTime/10.0;).

To perform this kind of benchmarl test, you better need to use nanoTime() (remember that 1 000 000ns = 1ms (currentTimeMillis)), except if your programm takes more than 10sec in that case get the time in ms is sufficient (but still possible to use nanoTime()) :
long start = System.nanoTime();
int nbRound = 10;
for (int i = 1; i <= nbRound; i++) {
//...
}
long end = System.nanoTime();
long totTime = end - start;
long avg = totTime / nbRound;
System.out.println(avg);
I'va also added a variable nbRound because it's very quick to make a mistake and write different values in the loop and in the computation of the average, so better use variable when you can
And because you program does small things you need to set n to a high value, my test :
n=100 ==> 105 673ns
n=100000 ==> 1 720 963ns (1.7ms)

The scan.nextInt() method is a blocking method. So your current thread blocks and the code never moves ahead to execute the remaining lines of code.
You need to change your design a bit to handle this blocking of nextInt() method. You can call the nextInt() method in another thread and remaining logic in another and can control their processing so your average time calculation logic runs.

Java factorial calculation with thread pool

I achieved to calculate factorial with two threads without the pool. I have two factorial classes which are named Factorial1, Factorial2 and extends Thread class. Let's consider I want to calculate the value of !160000. In Factorial1's run() method I do the multiplication in a for loop from i=2 to i=80000 and in Factorial2's from i=80001 to 160000. After that, i return both values and multiply them in the main method. When I compare the execution time it's much better (which is 5000 milliseconds) than the non-thread calculation's time (15000 milliseconds) even with two threads.
Now I want to write clean and better code because I saw the efficiency of threads at factorial calculation but when I use a thread pool to calculate the factorial value, the parallel calculation always takes more time than the non-thread calculation (nearly 16000). My code pieces look like:
for(int i=2; i<= Calculate; i++)
{
myPool.execute(new Multiplication(result, i));
}
run() method which is in Multiplication class:
public void run()
{
s1.Mltply(s2); // s1 and s2 are instances of my Number class
// their fields holds BigInteger values
}
Mltply() method which is in Number class:
public void Multiply(int number)
{
area.lock(); // result is going wrong without lock
Number temp = new Number(number);
value = value.multiply(temp.value); // value is a BigInteger
area.unlock();
}
In my opinion this lock may kills the all advantage of the thread usage because it seems like all that threads do is multiplication but nothing else. But without it, i can't even calculate the true result. Let's say i want to calculate !10, so thread1 calculates the 10*9*8*7*6 and thread2 calculate the 5*4*3*2*1. Is that the way I'm looking for? Is it even possible with thread pool? Of course execution time must be less than the normal calculation...
I appreciate all your help and suggestion.
EDIT: - My own solution to the problem -
public class MyMultiplication implements Runnable
{
public static BigInteger subResult1;
public static BigInteger subResult2;
int thread1StopsAt;
int thread2StopsAt;
long threadId;
static boolean idIsSet=false;
public MyMultiplication(BigInteger n1, int n2) // First Thread
{
MyMultiplication.subResult1 = n1;
this.thread1StopsAt = n2/2;
thread2StopsAt = n2;
}
public MyMultiplication(int n2,BigInteger n1) // Second Thread
{
MyMultiplication.subResult2 = n1;
this.thread2StopsAt = n2;
thread1StopsAt = n2/2;
}
#Override
public void run()
{
if(idIsSet==false)
{
threadId = Thread.currentThread().getId();
idIsSet=true;
}
if(Thread.currentThread().getId() == threadId)
{
for(int i=2; i<=thread1StopsAt; i++)
{
subResult1 = subResult1.multiply(BigInteger.valueOf(i));
}
}
else
{
for(int i=thread1StopsAt+1; i<= thread2StopsAt; i++)
{
subResult2 = subResult2.multiply(BigInteger.valueOf(i));
}
}
}
}
public class JavaApplication3
{
public static void main(String[] args) throws InterruptedException
{
int calculate=160000;
long start = System.nanoTime();
BigInteger num = BigInteger.valueOf(1);
for (int i = 2; i <= calculate; i++)
{
num = num.multiply(BigInteger.valueOf(i));
}
long end = System.nanoTime();
double time = (end-start)/1000000.0;
System.out.println("Without threads: \t" +
String.format("%.2f",time) + " miliseconds");
System.out.println("without threads Result: " + num);
BigInteger num1 = BigInteger.valueOf(1);
BigInteger num2 = BigInteger.valueOf(1);
ExecutorService myPool = Executors.newFixedThreadPool(2);
start = System.nanoTime();
myPool.execute(new MyMultiplication(num1,calculate));
Thread.sleep(100);
myPool.execute(new MyMultiplication(calculate,num2));
myPool.shutdown();
while(!myPool.isTerminated()) {} // waiting threads to end
end = System.nanoTime();
time = (end-start)/1000000.0;
System.out.println("With threads: \t" +String.format("%.2f",time)
+ " miliseconds");
BigInteger result =
MyMultiplication.subResult1.
multiply(MyMultiplication.subResult2);
System.out.println("With threads Result: " + result);
System.out.println(MyMultiplication.subResult1);
System.out.println(MyMultiplication.subResult2);
}
}
input : !160000
Execution time without threads : 15000 milliseconds
Execution time with 2 threads : 4500 milliseconds
Thanks for ideas and suggestions.

You may calculate !160000 concurrently without using a lock by splitting 160000 into disjunct junks as you explaint by splitting it into 2..80000 and 80001..160000.
But you may achieve this by using the Java Stream API:
IntStream.rangeClosed(1, 160000).parallel()
.mapToObj(val -> BigInteger.valueOf(val))
.reduce(BigInteger.ONE, BigInteger::multiply);
It does exactly what you try to do. It splits the whole range into junks, establishes a thread pool and computes the partial results. Afterwards it joins the partial results into a single result.
So why do you bother doing it by yourself? Just practicing clean coding?
On my real 4 core machine computation in a for loop took 8 times longer than using a parallel stream.

Threads have to run independent to run fast. Many dependencies like locks, synchronized parts of your code or some system calls leads to sleeping threads which are waiting to access some resources.
In your case you should minimize the time a thread is inside the lock. Maybe I am wrong, but it seems like you create a thread for each number. So for 1.000! you spawn 1.000 Threads. All of them trying to get the lock on area and are not able to calculate anything, because one thread has become the lock and all other threads have to wait until the lock is unlocked again. So the threads are only running in serial which is as fast as your non-threaded example plus the extra time for locking and unlocking, thread management and so on. Oh, and because of cpu's context switching it gets even worse.
Your first attempt to splitt the factorial in two threads is the better one. Each thread can calculate its own result and only when they are done the threads have to communicate with each other. So they are independent most of the time.
Now you have to generalize this solution. To reduce context switching of the cpu you only want as many threads as your cpu has cores (maybe a little bit less because of your OS). Every thread gets a rang of numbers and calculates their product. After this it locks the overall result and adds its own result to it.
This should improve the performance of your problem.
Update: You ask for additional advice:
You said you have two classes Factorial1 and Factorial2. Probably they have their ranges hard codes. You only need one class which takes the range as constructor arguments. This class implements Runnable so it has a run-Method which multiplies all values in that range.
In you main-method you can do something like that:
int n = 160_000;
int threads = 2;
ExecutorService executor = Executors.newFixedThreadPool(threads);
for (int i = 0; i < threads; i++) {
int start = i * (n/threads) + 1;
int end = (i + 1) * (n/threads) + 1;
executor.execute(new Factorial(start, end));
}
executor.shutdown();
executor.awaitTermination(1, TimeUnit.DAYS);
Now you have calculated the result of each thread but not the overall result. This can be solved by a BigInteger which is visible to the Factorial-class (like a static BigInteger reuslt; in the same main class.) and a lock, too. In the run-method of Factorial you can calculate the overall result by locking the lock and calculation the result:
Main.lock.lock();
Main.result = Main.result.multiply(value);
Main.lock.unlock();
Some additional advice for the future: This isn't really clean because Factorial needs to have information about your main class, so it has a dependency to it. But ExecutorService returns a Future<T>-Object which can be used to receive the result of the thread. Using this Future-Object you don't need to use locks. But this needs some extra work, so just try to get this running for now ;-)

In addition to my Java Stream API solution here another solution which uses a self-managed thread-pool as you demanded:
public static final int CHUNK_SIZE = 10000;
public static BigInteger fac(int max) {
ExecutorService executor = newCachedThreadPool();
try {
return rangeClosed(0, (max - 1) / CHUNK_SIZE)
.mapToObj(val -> executor.submit(() -> prod(leftBound(val), rightBound(val, max))))
.map(future -> valueOf(future))
.reduce(BigInteger.ONE, BigInteger::multiply);
} finally {
executor.shutdown();
}
}
private static int leftBound(int chunkNo) {
return chunkNo * CHUNK_SIZE + 1;
}
private static int rightBound(int chunkNo, int max) {
return Math.min((chunkNo + 1) * CHUNK_SIZE, max);
}
private static BigInteger valueOf(Future<BigInteger> future) {
try {
return future.get();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private static BigInteger prod(int min, int max) {
BigInteger res = BigInteger.valueOf(min);
for (int val = min + 1; val <= max; val++) {
res = res.multiply(BigInteger.valueOf(val));
}
return res;
}

Simulating a Grocery Store Line Queue using Java

So I have a problem I've been wracking my brain over for about a week now. The situation is:
Consider a checkout line at the grocery store. During any given
second, the probability that a new customer joins the line is 0.02 (no
more than one customer joins the line during any given second). The
checkout clerk takes a random amount of time between 20 seconds to 75
seconds to serve each customer. Write a program to simulate this
scenario for about ten million seconds and print out the average
number of seconds that a customer spends waiting in line before the
clerk begins to serve the customer. Note that since you do not know
the maximum number of customers that may be in line at any given time,
you should use an ArrayList and not an array.
The expected average wait time is supposed to be between 500 and 600 seconds. However, I have not gotten an answer anywhere close to this range. Given that the probability of a new customer in the line is only 2%, I would expect the line to never have more than 1 person in it, so the average wait time would be about 45-50 secs. I have asked a friend (who is a math major) what his view on this problem, and he agreed that 45 seconds is a reasonable average given the 2% probability. My code so far is:
package grocerystore;
import java.util.ArrayList;
import java.util.Random;
public class GroceryStore {
private static ArrayList<Integer> line = new ArrayList();
private static Random r = new Random();
public static void addCustomer() {
int timeToServe = r.nextInt(56) + 20;
line.add(timeToServe);
}
public static void removeCustomer() {
line.remove(0);
}
public static int sum(ArrayList<Integer> a) {
int sum = 0;
for (int i = 0; i < a.size(); i++) {
sum += a.get(i);
}
return sum;
}
public static void main(String[] args) {
int waitTime = 0;
int duration = 10000;
for (int i = 0; i < duration; i++) {
double newCust = r.nextDouble();
if (newCust < .02) {
addCustomer();
}
try {
for (int j = 0; j < line.get(0); j++) {
waitTime = waitTime + sum(line);
}
} catch (IndexOutOfBoundsException e) {}
if (line.isEmpty()) {}
else {
removeCustomer();
}
}
System.out.println(waitTime/duration);
}
}
Any advice about this would be appreciated.

Here's some pseudocode to help you plan it out
for each second that goes by:
generate probability
if probability <= 0.02
add customer
if wait time is 0
if line is not empty
remove customer
generate a new wait time
else
decrement wait time

There's actually a very easy implementation of single server queueing systems where you don't need an ArrayList or Queue to stash customers who are in line. It's based on a simple recurrence relation described below.
You need to know the inter-arrival times' distribution, i.e., the distribution of times between one arrival and the next. Yours was described in time-stepped fashion as a probability of 0.02 of having a new arrival in a given tick of the clock. That equates to a Geometric distribution with p = 0.02. You already know the service time distribution - Uniform(20,75).
With those two pieces of info, and a bit of thought, you can deduce that for any given customer the arrival time is the previous customer's arrival-time plus a (generated) interarrival time; this customer can begin being served at either their arrival-time or the departure-time of the prior customer, whichever comes later; and they finish up with the server and depart at their begin-service time plus a (generated) service-time. You'll need to initialize the arrival-time and departure time of an imaginary zeroth customer to kick-start the whole thing, but then it's a simple loop to calculate the recurrence.
Since this looks like homework I'm giving you an implementation in Ruby. If you don't know Ruby, think of this as pseudo-code. It should be very straightforward for you to translate to Java. I've left out details such as how to generate the distributions, but I have actually run the complete implementation of this, replacing the commented lines with statistical tallies, and it gives average wait times around 500.
interarrival_time = Geometric.new(p_value)
service_time = Uniform.new(service_min, service_max)
arrival_time = depart_time = 0.0 # initialize zeroth customer
loop do
arrival_time += interarrival_time.generate
break if arrival_time > 10_000_000
start_time = [arrival_time, depart_time].max
depart_time = start_time + service_time.generate
delay_in_queue = start_time - arrival_time
# do anything you want with the delay_in_queue value:
# print it, tally it for averaging, whatever...
end
Note that this approach skips over the large swathes of time where nothing is happening, so it's a quite efficient little program compared to time-stepping through every tick of the simulated clock and storing things in dynamically sized containers.
One final note - you may want to ignore the first few hundred or thousand observations due to initialization bias. Simulation models usually need a "warm-up" period to remove the effect of the programmatically necessary initialization of variables to arbitrary values.

Instead of using an ArrayList, a Queue might be better suited for managing the customers. Also, remove the try/catch clause and a throws IndexOutOfBoundsException to the main function definition.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.