Many independent threads dependent on single thread - java

I have a file which has n columns and many rows
Col 1 col2 col3 .......col n
I want to read it once and write multiple (say m) outputs grouping the rows by few key columns. Say 3 outputs have to be produced:
For output 1:
groupingKeys[0]={1,2) //group the records on col 1 and 2
For output 2:
groupingKeys[1]={1,4,5} //group the records on col 1 4 5
For output 3
groupingKeys[2]={2,3} //group on col 2,3
In Main-thread I read the input file line by line. For each read line I want to process the read line in m different threads. So basically I want that the calls
map[0].process(data,groupingKeys[0]);
map[1].process(data,groupingKeys[1]);
map[2].process(data,groupingKeys[2]);
should run in 3 different threads and each of the 3 threads should proceed only after main thread has read the line.
I can create m different threads with run method of i-th thread having
map[i].process(data,groupingKeys[i]);
But these 3 threads should proceed only when main-thread which reads the line so that they see correct value of data[]. How can i achieve this?
Main thread thread-0 thread-1 thread-2
running waiting waiting waiting
waiting running running running
running waiting waiting waiting
At each step a line is read and processed
By processed i mean something similar to sql groupby is done for each of the grouping keys
Below is the sample code referred above.
public void writeMultipleGroupedOutputs(String inputfile,int groupingKeys[][])
{
Mymap<key,value>[] mapArr= new Mymap<key,value>[k]; //k maps to group records in k ways as per k grouping keys
String line;
while((line = br.readLine()) != null) {
String[] data=line.split(regex); **//one line is read in main thread**
for(int i=0;i<m;i++)
map[i].process(data,groupingKeys[i]); **//process in m different ways.How to make this happen in m independent threads?**
}
class Mymap extends HashMap<key,value> {
void process(String[] data,int[] keyIndexes)
{
//extract key from key indexes
//extract value from value indexes
put(key,value);
}
#Override
public Value put(Key k, Value v) {
if (containsKey(k)) {
oldval=get(k);
put(k,oldval.aggregate(v)); //put sum of old and new
return oldval;
}else{
put(k,v);
return null;
}
}
}
}
Sorry if i haven't made my point clear.In simple words the i want map[i].process(data,groupingKeys[i]); to happen in separate(i-th thread)
a b 5
a b 10
a c 15
so if i want to group by {1} and {1,2}
read line map1 map2
a b 5 [a--> b,5] [a,b ->5]
a b 10 [a-> b 15] [a,b->15]
a c 15 [a->b 30] [a,b->15 a,c->15]
Edit:
The question is not related to how i process or the logic of grouping but it is that: After each line is read i want to do something with the read line in different threads.

If I understand correctly, you wish to wait with processing until all the file is read. If so, depending on the details, you may want to check out CyclicBarrier or CountDownLatch
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CyclicBarrier.html
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CountDownLatch.html

Related

Thread executes too many times and causes race condition even though I'm using locks

I'm working on a multithread application for an exercise used to simulate a warehouse (similar to the producer consumer problem) however I'm running into some trouble with the program where increasing the number of consumer threads makes the program behave in unexpected ways.
The code:
I'm creating a producer thread called buyer which has as a goal to order precisely 10 orders from the warehouse each. To do this they have a shared object called warehouse on which a buyer can place an order, the order is then stored in a buffer in the shared object. After this the buyer sleeps for some time until it either tries again or all packs have been bought. The code to do this looks like this:
public void run() {
//Run until the thread has bought 10 packages, this ensures the thread
//will eventually stop execution automatically.
while(this.packsBought < 10) {
try {
//Sleep for a random amount of time between 1 and 50
//milliseconds.
Thread.sleep(this.rand.nextInt(49) + 1);
//Catch any interruptExceptions.
} catch (InterruptedException ex) {
//There is no problem if this exception is thrown, the thread
//will just make an order earlier than planned. that being said
//there should be no manner in which this exception is thrown.
}
//Create a new order.
Order order = new Order(this.rand.nextInt(3)+ 1,
this,
this.isPrime);
//Set the time at which the order was placed as now.
order.setOrderTime(System.currentTimeMillis());
//place the newly created order in the warehouse.
this.warehouse.placeOrder(order);
}
//Notify the thread has finished execution.
System.out.println("Thread: " + super.getName() + " has finished.");
}
As you can see the function placeOrder(Order order); is used to place an order at the warehouse. this function is responsible for placing the order in the queue based on some logic related to prime status. The function looks like this:
public void placeOrder(Order order) {
try{
//halt untill there are enough packs to handle an order.
this.notFullBuffer.acquire();
//Lock to signify the start of the critical section.
this.mutexBuffer.lock();
//Insert the order in the buffer depending on prime status.
if (order.isPrime()) {
//prime order, insert behind all prime orders in buffer.
//Enumerate all non prime orders in the list.
for (int i = inPrime; i < sizeOrderList - 1; i++) {
//Move the non prime order back 1 position in the list.
buffer[i + 1] = buffer[i];
}
// Insert the prime order.
buffer[inPrime++] = order;
} else {
//No prime order, insert behind all orders in buffer.
buffer[inPrime + inNormal++] = order;
}
//Notify the DispatchWorkers that a new order has been placed.
this.notEmptyBuffer.release();
//Catch any InterruptException that might occure.
} catch(InterruptedException e){
//Even though this isn't expected behavior, there is no reason to
//notify the user of this event or to preform any other action as
//the thread will just return to the queue before placing another
//error if it is still required to do so.
} finally {
//Unlock and finalize the critical section.
mutexBuffer.unlock();
}
}
The orders are consumed by workers which act as the consumer thread. The thread itself contains very simple code looping until all orders have been processed. In this loop a different function handleOrder(); is called on the same warehouse object which handles a single order from the buffer. It does so with the following code:
public void handleOrder(){
//Create a variable to store the order being handled.
Order toHandle = null;
try{
//wait until there is an order to handle.
this.notEmptyBuffer.acquire();
//Lock to signify the start of the critical section.
this.mutexBuffer.lock();
//obtain the first order to handle as the first element of the buffer
toHandle = buffer[0];
//move all buffer elementst back by 1 position.
for(int i = 1; i < sizeOrderList; i++){
buffer[i - 1] = buffer[i];
}
//set the last element in the buffer to null
buffer[sizeOrderList - 1] = null;
//We have obtained an order from the buffer and now we can handle it.
if(toHandle != null) {
int nPacks = toHandle.getnPacks();
//wait until the appropriate resources are available.
this.hasBoxes.acquire(nPacks);
this.hasTape.acquire(nPacks * 50);
//Now we can handle the order (Simulated by sleeping. Although
//in real live Amazon workers also have about 5ms of time per
//package).
Thread.sleep(5 * nPacks);
//Calculate the total time this order took.
long time = System.currentTimeMillis() -
toHandle.getOrderTime();
//Update the total waiting time for the buyer.
toHandle.getBuyer().setWaitingTime(time +
toHandle.getBuyer().getWaitingTime());
//Check if the order to handle is prime or not.
if(toHandle.isPrime()) {
//Decrement the position of which prime orders are
//inserted into the buffer.
inPrime--;
} else {
//Decrement the position of which normal orders are
//inserted into the buffer.
inNormal--;
}
//Print a message informing the user a new order was completed.
System.out.println("An order has been completed for: "
+ toHandle.getBuyer().getName());
//Notify the buyer he has sucsessfully ordered a new package.
toHandle.getBuyer().setPacksBought(
toHandle.getBuyer().getPacksBought() + 1);
}else {
//Notify the user there was a critical error obtaining the
//error to handle. (There shouldn't exist a case where this
//should happen but you never know.)
System.err.println("Something went wrong obtaining an order.");
}
//Notify the buyers that a new spot has been opened in the buffer.
this.notFullBuffer.release();
//Catch any interrupt exceptions.
} catch(InterruptedException e){
//This is expected behavior as it allows us to force the thread to
//revaluate it's main running loop when notifying it to finish
//execution.
} finally {
//Check if the current thread is locking the buffer lock. This is
//done as in the case of an interrupt we don't want to execute this
//code if the thread interrupted doesn't hold the lock as that
//would result in an exception we don't want.
if (mutexBuffer.isHeldByCurrentThread())
//Unlock the buffer lock.
mutexBuffer.unlock();
}
}
The problem:
To verify the functionallity of the program I use the output from the statement:
System.out.println("An order has been completed for: "
+ toHandle.getBuyer().getName());
from the handleOrder(); function. I place the whole output in a text file, remove all the lines which aren't added by this println(); statement and count the number of lines to know how many orders have been handled. I expect this value to be equal to the amount of threads times 10, however this is often not the case. Running tests I've noticed sometimes it does work and there are no problems but sometimes one or more buyer threads take more orders than they should. with 5 buyer threads there should be 50 outputs but I get anywhere from 50 to 60 lines (orders places).
Turning the amount of threads up to 30 increases the problem and now I can expect an increase of up to 50% more orders with some threads placing up to 30 orders.
Doing some research this is called a data-race and is caused by 2 threads accessing the same data at the same time while 1 of them writes to the data. This basically changes the data such that the other thread isn't working with the same data it expects to be working with.
My attempt:
I firmly believe ReentrantLocks are designed to handle situations like this as they should stop any thread from entering a section of code if another thread hasn't left it. Both the placeOrder(Order order); and handleOrder(); function make use of this mechanic. I'm therefor assuming I didn't implement this correctly. Here is a version of the project which is compileable and executable from a single file called Test.java. Would anyone be able to take a look at that or the code explained above and tell me what I'm doing wrong?
EDIT
I noticed there was a way a buyer could place more than 10 orders so I changed the code to:
/*
* The run method which is ran once the thread is started.
*/
public void run() {
//Run until the thread has bought 10 packages, this ensures the thread
//will eventually stop execution automatically.
for(packsBought = 0; packsBought < 10; packsBought++)
{
try {
//Sleep for a random amount of time between 1 and 50
//milliseconds.
Thread.sleep(this.rand.nextInt(49) + 1);
//Catch any interruptExceptions.
} catch (InterruptedException ex) {
//There is no problem if this exception is thrown, the thread
//will just make an order earlier than planned. that being said
//there should be no manner in which this exception is thrown.
}
//Create a new order.
Order order = new Order(this.rand.nextInt(3)+ 1,
this,
this.isPrime);
//Set the time at which the order was placed as now.
order.setOrderTime(System.currentTimeMillis());
//place the newly created order in the warehouse.
this.warehouse.placeOrder(order);
}
//Notify the thread has finished execution.
System.out.println("Thread: " + super.getName() + " has finished.");
}
in the buyers run(); function yet I'm still getting some threads which place over 10 orders. I also removed the update of the amount of packs bought in the handleOrder(); function as that is now unnecessary. here is an updated version of Test.java (where all classes are together for easy execution) There seems to be a different problem here.
There are some concurrency issues with the code, but the main bug is not related to them: it's in the block starting in line 512 on placeOrder
//Enumerate all non prime orders in the list.
for (int i = inPrime; i < sizeOrderList - 1; i++) {
//Move the non prime order back 1 position in the list.
buffer[i + 1] = buffer[i];
}
when there is only one normal order in the buffer, then inPrime value is 0, inNormal is 1, buffer[0] is the normal order and the rest of the buffer is null.
The code to move non primer orders, starts in index 0, and then does:
buffer[1] = buffer[0] //normal order in 0 get copied to 1
buffer[2] = buffer[1] //now its in 1, so it gets copied to 2
buffer[3] = buffer[2] //now its in 2 too, so it gets copied to 3
....
so it moves the normal order to buffer[1] but then it copies the contents filling all the buffer with that order.
To solve it you should copy the array in reverse order:
//Enumerate all non prime orders in the list.
for (int i = (sizeOrderList-1); i > inPrime; i--) {
//Move the non prime order back 1 position in the list.
buffer[i] = buffer[i-1];
}
As for the concurrency issues:
If you check a field on a thread, updated by another thread you should declare it as volatile. Thats the case of the run field in DispatcherWorker and ResourceSupplier. See: https://stackoverflow.com/a/8063587/11751648
You start interrupting the dispatcher threads (line 183) while they are still processing packages. So if they are stopped at 573, 574 or 579, they will throw an InterruptedException and not finish the processing (hence in the last code not always all packages are delivered). You could avoid this by checking that the buffer is empty before start interrupting dispatcher threads, calling warehouse.notFullBuffer.acquire(warehouse.sizeOrderList); on 175
When catching InterruptedException you should always call Thread.currentThread().interrupt(); the preserve the interrupted status of the Thread. See: https://stackoverflow.com/a/3976377/11751648
I believe you may be chasing ghosts. I'm not entirely sure why you're seeing more outputs than you're expecting, but the number of orders placed appears to be in order. Allow me to clarify:
I've added a Map<String,Integer> to the Warehouse class to map how many orders each thread places:
private Map<String,Integer> ordersPlaced = new TreeMap<>();
// Code omitted for brevity
public void placeOrder(Order order)
{
try
{
//halt untill there are enough packs to handle an order.
this.notFullBuffer.acquire();
//Lock to signify the start of the critical section.
this.mutexBuffer.lock();
ordersPlaced.merge(Thread.currentThread().getName(), 1, Integer::sum);
// Rest of method
}
I then added a for-loop to the main method to execute the code 100 times, and added the following code to the end of each iteration:
warehouse.ordersPlaced.forEach((thread, orders) -> System.out.printf(" %s - %d%n", thread, orders));
I placed a breakpoint inside the lambda expression, with condition orders != 10. This condition never triggered in the 100+ runs I executed. As far as I can tell, your code is working as intended. I've increased both nWorkers and nBuyers to 100 just to be sure.
I believe you're using ReentrantLock correctly, and I agree that it is probably the best choice for your use case.
referring at your code on pastebin
THE GENERIC PROBLEM:
In the function public void handleOrder() he sleep (line 582) Thread.sleep(5 * nPacks); is inside the lock(): unlock(): block.
With this position of sleep, it has no sense to have many DispatchWorker because n-1 will wait at line 559 this.mutexBuffer.lock() while one is sleeping at line 582.
THE BUG:
The bug is in line 173. You should remove it.
In your main() you join all buyers and this is correct. Then you try to stop the workers. The workers at this time are already running to complete orders that will be completed seconds after. You should only set worker.runThread(false); and then join the thead (possibly in two separate loops). This solution really waits for workers to complete orders. Interrupting the thread that is sleeping at line 582 will raise an InterruptedException and the following lines are skipped, in particular line 596 or 600 that update inPrime and in Normal counters generating unpredictable behaviours.
moving line 582 after line 633 and removing line 173 will solve the problem
HOW TO TEST:
My suggestion is to introduce a counter of all Packs boxes generated by supplier and a counter of all boxes ordered and finally check if generated boxes are equals at ordered plus that left in the whorehouse.

Is CyclicBarrier.getNumberWaiting() accurate?

I analysis the code in jdk1.8, but may have same issue in other jdk version
Let's assume the parties = 3 in the following code
CyclicBarrier cb = new CyclicBarrier(3);
parties = 3 and count > = 0, so the return value of getNumberWaiting() <= 3
but in some certain cases, more than 3 threads will be waiting
2.let's see the key code in CyclicBarrier
a) thread A in position 2 will return 0, now there are 2 threads await in position 3
b) after thread A execute lock.unlock(), thread B in position 1 get the lock(but the lock is unfair), so now index = 2, count =2, it will await in position 3, so now there are 3 threads await in position 3
c) let's assume, the lock will always be got by the thread from position 1, so the number of waiting thread will be more and more
so the getNumberWaiting() > 3 is the result
getNumberWaiting() = (cyclic numbers) * parties - count
I think you need to look at the "generation" concept a little more. In your scenario, Thread A will have called nextGeneration() which resets all counts (getNumberWaiting() = 0) and signals all current waiters. Those waiters (on the now-previous generation) will shortly start completing.
So yes, there may be >3 threads against the trip Condition but 2 x old waiters have been signalled to leave and any new ones are awaiting a new signal. The getNumberWaiting is not computed using Lock.getHoldCount() so this is OK.

Allowing user to input number of threads and based on the number of threads distribute additions to each thread

Im trying to create a program whereby users can enter a range of numbers for eg: 1 and then 10, so 1+2+3+4+5+6+7+8+9+10 will be added, but not directly, they can be separated into threads. So user can input the number of threads, for eg: if its 5 threads, thread 1 will perform 1+2, thread 2 performs 3+4 , thread 3 and so on. Each thread will then add together to come up with a total.
I have done the part where the addition works but im not sure how to separate it into the number of user input threads.
You need to somehow keep the index of elements printed to console already, so it could be done by sending a parameter to the run() function which starts from the first element and keeps on to the second thread.
try the following implementation of the code changes:
for (int i=0; i<threadNo; i++)
{
MultithreadingSum object = new
MultithreadingSum(start,end,threadNo,noOfDigits);
object.run(i*(end/threadNo)); //sending a parameter to run()
}
public void run(int time) //time variable to start each time run() is called from current place
{
int total =0;
System.out.println("Thread executed");
for(a = i + time; a <= j/n+time ; a++){
System.out.print(a +" ");
total+=a;
}
System.out.println(getName() + " is "+ total);
}

Can this code have race condition?

public static void deleteLast(Vector list) {
int lastIndex = list.size() - 1;//line 2
list.remove(lastIndex); // line 3
}
I known Vector is threadsafe in java
but can this case happen
let's say in this case has list.size()=10
Thread A calls deleteLast and at line 2 lastIndex = 9 .It stops for some reason
Thread B call deleteLast and at line 2 lastIndex = 9.it goes to line 3 and now list has 9 elements
Thread A now wakes up and goes to line 3 now it tries to remove object at index 9 which doesn't exist and we have an exception here
Sure. You've correctly identified a race condition.
Yes, it can. And you should use a BlockingQueue rather than a List to avoid this scenario.

Is id = 1 - id atomic?

From page 291 of OCP Java SE 6 Programmer Practice Exams, question 25:
public class Stone implements Runnable {
static int id = 1;
public void run() {
id = 1 - id;
if (id == 0)
pick();
else
release();
}
private static synchronized void pick() {
System.out.print("P ");
System.out.print("Q ");
}
private synchronized void release() {
System.out.print("R ");
System.out.print("S ");
}
public static void main(String[] args) {
Stone st = new Stone();
new Thread(st).start();
new Thread(st).start();
}
}
One of the answers is:
The output could be P Q P Q
I marked this answer as correct. My reasoning:
We are starting two threads.
First one enters run().
According to JLS 15.26.1, it firstly evaluates 1 - id. Result is 0. It is stored on the thread's stack. We are just about to save that 0 to static id, but...
Boom, scheduler chooses the second thread to run.
So, the second thread enters run(). Static id is still 1, so he executes method pick(). P Q is printed.
Scheduler chooses first thread to run. It takes 0 from its stack and saves to static id. So, the first thread also executes pick() and prints P Q.
However, in the book it's written that this answer is incorrect:
It is incorrect because the line id = 1 - id swaps the value of id between 0 and 1. There is no chance for the same method to be executed twice.
I don't agree. I think there is some chance for the scenario I presented above. Such swap is not atomic. Am I wrong?
Am I wrong?
Nope, you're absolutely right - as is your example timeline.
In addition to it not being atomic, it's not guaranteed that the write to id will be picked up by the other thread anyway, given that there's no synchronization and the field isn't volatile.
It's somewhat disconcerting for reference material like this to be incorrect :(
In my opinion, the answer in the Practice Exams is correct. In this code, you are executing two threads which have access to the same static variable id. Static variables are stored on the heap in java, not on the stack. The execution order of runnables is unpredictable.
However, in order to change the value of id each thread :
makes local copy of the value stored in id's memory address to the CPU registry;
performs the operation 1 - id. Strictly speaking, two operations are performed here (-id and +1);
moves the result back to memory space of id on the heap.
This means that although the id value can be changed concurrently by any of the two threads, only the initial and final values are mutable. Intermediate values will not be modified by one another.
Futhermore, analysis of the code can show that at any point in time, id can only be 0 or 1.
Proof:
Starting value id = 1;
One thread will change it to 0 ( id = 1 - id ). And the other thread will bring it back to 1.
Starting value id = 0;
One thread will change it to 1 ( id = 1 - id ). And the other thread will bring it back to 0.
Therefore, the value state of id is discrete either 0 or 1.
End of Proof.
There can be two possibilities for this code:
Possibility 1. Thread one accesses the variable id first. Then the value of id (id = 1 - id changes to 0. Thereafter, only the method pick () will be executed, printing P Q. Thread two, will evaluate id at that time id = 0; method release() will then be executed printing R S. As a result, P Q R S will be printed.
Possibility 2. Thread two accesses the variable id first. Then the value of id (id = 1 - id changes to 0. Thereafter, only the method pick () will be executed, printing P Q. Thread one, will evaluate id at that time id = 0; method release() will then be executed printing R S. As a result, P Q R S will be printed.
There are no other possibilities. However, it should be noted that variants of P Q R S such as P R Q S or R P Q S, etc. may be printed due to pick() being a static method and is therefore shared between the two threads. This leads to the simultaneous execution of this method which could result in printing the letters in a different order depending on your platform.
However in any case, never will either the method pick() or release () be executed twice as they are mutually exclusive. Therefore P Q P Q will not be an output.

Categories