main thread consumer and other threads producer - java

My questions is, I have a data set of 1000 records. I want 3 threads that process the data like this,
thread1 from record 1 to 300, thread2 from 301 to 600 and so on. One thread can make a request and fetch 50 records at a time, create an object and put it in a queue.
Main thread will simultaneously read data from the queue.
Below is the code, the problem I am facing is that recordRead variable tells the starting point from where the thread should start reading the records.
But how can I set different value for each thread e.g for thread1 it should be 0 and recordsToRead should be 300 and for thread2, recordRead should be 300 and recordsToRead to be 300+300=600 and for last thread it should be 600 and upto the end.
pagesize=50
pagesize,recordRead and recordToRead are all variables that belong to main class and main thread.
ExecutorService service = Executors.newFixedThreadPool(nThreads);
while(nThreads > 0) {
nThreads--;
service.execute(new Runnable() {
#Override
public void run() {
// TODO Auto-generated method stub
do {
int respCode = 0;
int RecordsToRead = div;
JSONObject jsObj = new JSONObject();
jsObj.put("pagesize", pageSize);
jsObj.put("start", recordsRead);
jsObj.put("searchinternalid", searchInternalId);
try {
boolean status = req.invoke(jsObj);
respCode = req.getResponseCode();
} catch (Exception e) {
req.reset();
e.printStackTrace();
return true;
}
JSONObject jsResp = req.getResponseJson();
//here jsResp will be added to ArrayBlockingQueue.
req.reset();
}while(!isError && !isMaxLimit && recordsRead < RecordsToRead);
}
});
}
After this loop will be the code of main thread reading the queue.
how can I set recordsRead and recordToread for all threads.
And how to make main thread wait untill atleast one thread inserts an object in queue.

I see in you definition two problems. First problem is to perform parallel chunk computation and second one is to create a continous pipeline from this. Lets start from the first problem. To make parallel computations with predefined size the best option fmpv will be to use fork-join framework. Not only by performance (work stealing is really effective) but also due to simpler code. But since you are limited to 3 threads for me it also seems valid to use threads directly. Simply what you want can me implemented by this way:
final int chunkSize = 300;
//you can also use total amount of job
//int totalWork = 1000 and chunk size equals totalWork/threadsNumber
final int threadsNumber = 3;
Thread[] threads = new Thread[threadsNumber];
for (int ii = 0; ii < threadsNumber; ii++) {
final int i = ii;
threads[ii] = new Thread(() -> {
//count your variable according the volume
// for example you can do so
int chunkStart = i * chunkSize;
int chunkEnd = chunkStart + chunkSize;
for(int j = chunkStart; j < chunkEnd; j++) {
//object creation with necessary proprs
//offer to queue here
}
});
threads[ii].start();
}
//your code here
//take here
for (int ii = 0; ii < threadsNumber; ii++) {
try {
//this part is only as example
//you do not need it
//here if you want you can also w8 for completion of all threads
threads[ii].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
Now about the second problem with consuming. For this puprose you can use for example ConcurrentLinkedBlockingQueue (http://www.jgroups.org/javadoc/org/jgroups/util/ConcurrentLinkedBlockingQueue.html). Make offer in producer threads and use take method in main.
But to be honest i still did not get the reason of your problem. Do you want to create continuous pipeline or it is just one-time computation?
Also i will recommend you to take this course: https://www.coursera.org/learn/parallel-programming-in-java/home/welcome.
This will help you exactly with your problem and provide various solutions. There are also concurrent and distributed computing courses.

Related

Concurrency java on for loop

I have a method that runs with Selenium to create user accounts on a website quickly. Currently it processes one after the other, but I'm thinking if I can process 10 at once that would be better.
I have a for loop currently, which is used to tell the code within, which line of my 2D array to read the user information from. I am struggling with the concept of how to make any stream or thread use the correct value and fetch the correct user information.
Currently I have something similar to the below simplified:
I need to load a new page and driver everytime this loops and need to send the value of the array to the web field. So basically I want this to go off and loop and not wait for the first loop to finish before starting the next loop, but probably limit to 10 or so running at once.
for(i=0,i<myarray.length, i++)
{
Webdriver.start();
WebElement.findby.(By.name("field1").sendkeys(myArray[i][2]);
Webdriver.end();
}
As I said code is not actual code it is just to get my question across.
Hope that is clear.
I think you're saying, I am iterating through myArray and running my test once for each element in that array, but instead of running one test and waiting for it to finish before running the next, I want to run a whole bunch at a time.
You can do this pretty trivially with the Java 8 ForkJoinPool.
ForkJoinTask[] tasks = new ForkJoinTask[myarray.length];
for(i=0,i<myarray.length, i++)
{
int j = i; // need an effectively final copy of i
tasks[i] = ForkJoinPool.commonPool().submit(() -> {
Webdriver.start();
WebElement.findby.(By.name("field1").sendkeys(myArray[j][2]);
Webdriver.end();
});
}
for (i = 0; i < my array.length; i++) {
tasks[i].join();
}
The tests will run in parallel using threads from the "common" ForkJoinPool. If you want to adjust the number of threads that are used, create your own ForkJoinPool. (See this question for more information.)
I would explicitly start separate Thread for every task as the most of time will be likely spent on waiting until the user account is created.
Please, see the code snippet with rough example below:
public void createAccounts() throws InterruptedException {
List<Thread> threadList = new ArrayList<>();
Object[][] myArray = new Object[1][1];
for(int i=0; i<myArray.length; i++) {
final int index = i;
//Add thread for user creation
threadList.add(new Thread(new Runnable() {
#Override
public void run() {
Webdriver.start();
WebElement.findby.(By.name("field1").sendkeys(myArray[index][2]);
Webdriver.end();
}
}));
}
//Start all threads
for (Thread thread : threadList) {
thread.start();
}
//Wait until all threads are finished
for (Thread thread : threadList) {
thread.join();
}
}

Concurrent checking if collection is empty

I have this piece of code:
private ConcurrentLinkedQueue<Interval> intervals = new ConcurrentLinkedQueue();
#Override
public void run(){
while(!intervals.isEmpty()){
//remove one interval
//do calculations
//add some intervals
}
}
This code is being executed by a specific number of threads at the same time. As you see, loop should go on until there are no more intervals left in the collection, but there is a problem. In the beginning of each iteration an interval gets removed from collection and in the end some number of intervals might get added back into same collection.
Problem is, that while one thread is inside the loop the collection might become empty, so other threads that are trying to enter the loop won't be able to do that and will finish their work prematurely, even though collection might be filled with values after the first thread will finish the iteration. I want the thread count to remain constant (or not more than some number n) until all work is really finished.
That means that no threads are currently working in the loop and there are no elements left in the collection. What are possible ways of accomplishing that? Any ideas are welcomed.
One way to solve this problem in my specific case is to give every thread a different piece of the original collection. But after one thread would finish its work it wouldn't be used by the program anymore, even though it could help other threads with their calculations, so I don't like this solution, because it's important to utilize all cores of the machine in my problem.
This is the simplest minimal working example I could come up with. It might be to lengthy.
public class Test{
private ConcurrentLinkedQueue<Interval> intervals = new ConcurrentLinkedQueue();
private int threadNumber;
private Thread[] threads;
private double result;
public Test(int threadNumber){
intervals.add(new Interval(0, 1));
this.threadNumber = threadNumber;
threads = new Thread[threadNumber];
}
public double find(){
for(int i = 0; i < threadNumber; i++){
threads[i] = new Thread(new Finder());
threads[i].start();
}
try{
for(int i = 0; i < threadNumber; i++){
threads[i].join();
}
}
catch(InterruptedException e){
System.err.println(e);
}
return result;
}
private class Finder implements Runnable{
#Override
public void run(){
while(!intervals.isEmpty()){
Interval interval = intervals.poll();
if(interval.high - interval.low > 1e-6){
double middle = (interval.high + interval.low) / 2;
boolean something = true;
if(something){
intervals.add(new Interval(interval.low + 0.1, middle - 0.1));
intervals.add(new Interval(middle + 0.1, interval.high - 0.1));
}
else{
intervals.add(new Interval(interval.low + 0.1, interval.high - 0.1));
}
}
}
}
}
private class Interval{
double low;
double high;
public Interval(double low, double high){
this.low = low;
this.high = high;
}
}
}
What you might need to know about the program: After every iteration interval should either disappear (because it's too small), become smaller or split into two smaller intervals. Work is finished after no intervals are left. Also, I should be able to limit number of threads that are doing this work with some number n. The actual program looks for a maximum value of some function by dividing the intervals and throwing away the parts of those intervals that can't contain the maximum value using some rules, but this shouldn't really be relevant to my problem.
The CompletableFuture class is also an interesting solution for these kind of tasks.
It automatically distributes workload over a number of worker threads.
static CompletableFuture<Integer> fibonacci(int n) {
if(n < 2) return CompletableFuture.completedFuture(n);
else {
return CompletableFuture.supplyAsync(() -> {
System.out.println(Thread.currentThread());
CompletableFuture<Integer> f1 = fibonacci(n - 1);
CompletableFuture<Integer> f2 = fibonacci(n - 2);
return f1.thenCombineAsync(f2, (a, b) -> a + b);
}).thenComposeAsync(f -> f);
}
}
public static void main(String[] args) throws Exception {
int fib = fibonacci(10).get();
System.out.println(fib);
}
You can use atomic flag, i.e.:
private ConcurrentLinkedQueue<Interval> intervals = new ConcurrentLinkedQueue<>();
private AtomicBoolean inUse = new AtomicBoolean();
#Override
public void run() {
while (!intervals.isEmpty() && inUse.compareAndSet(false, true)) {
// work
inUse.set(false);
}
}
UPD
Question has been updated, so I would give you better solution. It is more "classic" solution using blocking queue;
private BlockingQueue<Interval> intervals = new ArrayBlockingQueue<Object>();
private volatile boolean finished = false;
#Override
public void run() {
try {
while (!finished) {
Interval next = intervals.take();
// put work there
// after you decide work is finished just set finished = true
intervals.put(interval); // anyway, return interval to queue
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
UPD2
Now it seems better to re-write solution and divide range to sub-ranges for each thread.
Your problem looks like a recursive one - processing one task (interval) might produce some sub-tasks (sub intervals).
For that purpose I would use ForkJoinPool and RecursiveTask:
class Interval {
...
}
class IntervalAction extends RecursiveAction {
private Interval interval;
private IntervalAction(Interval interval) {
this.interval = interval;
}
#Override
protected void compute() {
if (...) {
// we need two sub-tasks
IntervalAction sub1 = new IntervalAction(new Interval(...));
IntervalAction sub2 = new IntervalAction(new Interval(...));
sub1.fork();
sub2.fork();
sub1.join();
sub2.join();
} else if (...) {
// we need just one sub-task
IntervalAction sub3 = new IntervalAction(new Interval(...));
sub3.fork();
sub3.join();
} else {
// current task doesn't need any sub-tasks, just return
}
}
}
public static void compute(Interval initial) {
ForkJoinPool pool = new ForkJoinPool();
pool.invoke(new IntervalAction(initial));
// invoke will return when all the processing is completed
}
I had the same problem, and I tested the following solution.
In my test example I have a queue (the equivalent of your intervals) filled with integers. For the test, at each iteration one number is taken from the queue, incremented and placed back in the queue if the new value is below 7 (arbitrary). This has the same impact as your interval generation on the mechanism.
Here is an example working code (Note that I develop in java 1.8 and I use the Executor framework to handle my thread pool.) :
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.PriorityBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
public class Test {
final int numberOfThreads;
final BlockingQueue<Integer> queue;
final BlockingQueue<Integer> availableThreadsTokens;
final BlockingQueue<Integer> sleepingThreadsTokens;
final ThreadPoolExecutor executor;
public static void main(String[] args) {
final Test test = new Test(2); // arbitrary number of thread => 2
test.launch();
}
private Test(int numberOfThreads){
this.numberOfThreads = numberOfThreads;
this.queue = new PriorityBlockingQueue<Integer>();
this.availableThreadsTokens = new LinkedBlockingQueue<Integer>(numberOfThreads);
this.sleepingThreadsTokens = new LinkedBlockingQueue<Integer>(numberOfThreads);
this.executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(numberOfThreads);
}
public void launch() {
// put some elements in queue at the beginning
queue.add(1);
queue.add(2);
queue.add(3);
for(int i = 0; i < numberOfThreads; i++){
availableThreadsTokens.add(1);
}
System.out.println("Start");
boolean algorithmIsFinished = false;
while(!algorithmIsFinished){
if(sleepingThreadsTokens.size() != numberOfThreads){
try {
availableThreadsTokens.take();
} catch (final InterruptedException e) {
e.printStackTrace();
// some treatment should be put there in case of failure
break;
}
if(!queue.isEmpty()){ // Continuation condition
sleepingThreadsTokens.drainTo(availableThreadsTokens);
executor.submit(new Loop(queue.poll(), queue, availableThreadsTokens));
}
else{
sleepingThreadsTokens.add(1);
}
}
else{
algorithmIsFinished = true;
}
}
executor.shutdown();
System.out.println("Finished");
}
public static class Loop implements Runnable{
int element;
final BlockingQueue<Integer> queue;
final BlockingQueue<Integer> availableThreadsTokens;
public Loop(Integer element, BlockingQueue<Integer> queue, BlockingQueue<Integer> availableThreadsTokens){
this.element = element;
this.queue = queue;
this.availableThreadsTokens = availableThreadsTokens;
}
#Override
public void run(){
System.out.println("taking element "+element);
for(Long l = (long) 0; l < 500000000L; l++){
}
for(Long l = (long) 0; l < 500000000L; l++){
}
for(Long l = (long) 0; l < 500000000L; l++){
}
if(element < 7){
this.queue.add(element+1);
System.out.println("Inserted element"+(element + 1));
}
else{
System.out.println("no insertion");
}
this.availableThreadsTokens.offer(1);
}
}
}
I ran this code for check, and it seems to work properly. However there are certainly some improvement that can be made :
sleepingThreadsTokens do not have to be a BlockingQueue, since only the main accesses it. I used this interface because it allowed a nice sleepingThreadsTokens.drainTo(availableThreadsTokens);
I'm not sure whether queue has to be blocking or not, since only main takes from it and does not wait for elements (it waits only for tokens).
...
The idea is that the main thread checks for the termination, and for this it has to know how many threads are currently working (so that it does not prematurely stops the algorithm because the queue is empty). To do so two specific queues are created : availableThreadsTokens and sleepingThreadsTokens. Each element in availableThreadsTokens symbolizes a thread that have finished an iteration, and wait to be given another one. Each element in sleepingThreadsTokens symbolizes a thread that was available to take a new iteration, but the queue was empty, so it had no job and went to "sleep". So at each moment availableThreadsTokens.size() + sleepingThreadsTokens.size() = numberOfThreads - threadExcecutingIteration.
Note that the elements on availableThreadsTokens and sleepingThreadsTokens only symbolizes thread activity, they are not thread nor design a specific thread.
Case of termination : let suppose we have N threads (aribtrary, fixed number). The N threads are waiting for work (N tokens in availableThreadsTokens), there is only 1 remaining element in the queue and the treatment of this element won't generate any other element. Main takes the first token, finds that the queue is not empty, poll the element and sends the thread to work. The N-1 next tokens are consumed one by one, and since the queue is empty the token are moved into sleepingThreadsTokens one by one. Main knows that there is 1 thread working in the loop since there is no token in availableThreadsTokens and only N-1 in sleepingThreadsTokens, so it waits (.take()). When the thread finishes and releases the token Main consumes it, discovers that the queue is now empty and put the last token in sleepingThreadsTokens. Since all tokens are now in sleepingThreadsTokens Main knows that 1) all threads are inactive 2) the queue is empty (else the last token wouldn't have been transferred to sleepingThreadsTokens since the thread would have take the job).
Note that if the working thread finishes the treatment before all the availableThreadsTokens are moved to sleepingThreadsTokens it makes no difference.
Now if we suppose that the treatment of the last element would have generated M new elements in the queue then the Main would have put all the tokens from sleepingThreadsTokens back to availableThreadsTokens, and start to assign them treatments again. We put all the token back even if M < N because we don't know how much elements will be inserted in the future, so we have to keep all the thread available.
I would suggest a master/worker approach then.
The master process goes through the intervals and assigns the calculations of that interval to a different process. It also removes/adds as necessary. This way, all the cores are utilized, and only when all intervals are finished, the process is done. This is also known as dynamic work allocation.
A possible example:
public void run(){
while(!intervals.isEmpty()){
//remove one interval
Thread t = new Thread(new Runnable()
{
//do calculations
});
t.run();
//add some intervals
}
}
The possible solution you provided is known as static allocation, and you're correct, it will finish as fast as the slowest processor, but the dynamic approach will utilize all memory.
I've run into this problem as well. The way I solved it was to use an AtomicInteger to know what is in the queue. Before each offer() increment the integer. After each poll() decrement the integer. The CLQ has no real isEmpty() since it must look at head/tail nodes and this can change atomically (CAS).
This doesn't guarantee 100% that some thread may increment after another thread decrements so you need to check again before ending the thread. It is better than relying on while(...isEmpty())
Other than that, you may need to synchronize.

ThreadPoolExecutor Utility methods

I am writing a thread pool utility in my multithreading program. i just need to validate the following methods are correct and are they return the right values for me. i am using a LinkedBlockingQueue with size of 1. and also I refer to the java doc and it always says 'method will return approximate' number phrase. so i doubt weather following conditions are correct.
public boolean isPoolIdle() {
return myThreadPool.getActiveCount() == 0;
}
public int getAcceptableTaskCount() {
//initially poolSize is 0 ( after pool executes something it started to change )
if (myThreadPool.getPoolSize() == 0) {
return myThreadPool.getCorePoolSize() - myThreadPool.getActiveCount();
}
return myThreadPool.getPoolSize() - myThreadPool.getActiveCount();
}
public boolean isPoolReadyToAcceptTasks(){
return myThreadPool.getActiveCount()<myThreadPool.getCorePoolSize();
}
Please let me know your thoughts and suggestions.
UPDATE
interesting thing was if pool returns me there are 3 threads available from the getAcceptableTaskCount method and when i pass 3 tasks to the pool some times one task got rejected and it is handle by RejectedExecutionHandler. some times pool will handle all the tasks i passed. i am wondering why pool is rejected the tasks since i am passing tasks according to the available thread count.
--------- implementation of the answer of gray---
class MyTask implements Runnable {
#Override
public void run() {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("exec");
}
}
#Test
public void testTPool(){
ExecutorService pool = Executors.newFixedThreadPool(5);
List<Future<MyTask>> list = new ArrayList<Future<MyTask>>();
for (int i = 0; i < 5; i++) {
MyTask t = new MyTask();
list.add(pool.submit(t, t));
}
for (int i = 0; i < list.size(); i++) {
Future<MyTask> t = list.get(i);
System.out.println("Result -"+t.isDone());
MyTask m = new MyTask();
list.add(pool.submit(m,m));
}
}
This will print Result -false in the console meaning that task is not complete.
From your comments:
i need to know that if pool is idle or pool can accept the tasks. if pool can accept, i need to know how much free threads in the pool. if it is 5 i will send 5 tasks to the pool to do the processing.
I don't think that you should be doing the pool accounting yourself. For your thread pool if you use Executors.newFixedThreadPool(5) then you can submit as many tasks as you want and it will only run them in 5 threads.
so i get the first most 5 tasks from the vector and assign them to the pool.ignore the other tasks in the vector since they may be update / remove from a separate cycle
Ok, I see. So you want to maximize parallelization while at the same time not pre-loading jobs? I would think that something like the following pseudo code would work:
int numThreads = 5;
ExecutorService threadPool = Executors.newFixedThreadPool(numThreads);
List<Future<MyJob>> futures = new ArrayList<Future<MyJob>>();
// submit the initial jobs
for (int i = 0; i < numThreads; i++) {
MyJob myJob = getNextBestJob();
futures.add(threadPool.submit(myJob, myJob));
}
// the list is growing so we use for i
for (int i = 0; i < futures.size(); i++) {
// wait for a job to finish
MyJob myJob = futures.get(i);
// process the job somehow
// get the next best job now that the previous one finished
MyJob nextJob = getNextBestJob();
if (nextJob != null) {
// submit the next job unless we are done
futures.add(threadPool.submit(myJob, myJob));
}
}
However, I don't quite understand how the thread count would change however. If you edit your question with some more details I can tweak my response.

Some problems with Threads

I'm having a-bit of trouble with threads in java. Basically Im creating an array of threads and starting them. the point of the program is to simulate a race, total the time for each competitor ( i.e. each thread ) and pick the winner.
The competitor moves one space, waits ( i.e. thread sleeps for a random period of time between 5 and 6 seconds ) and then continues. The threads don't complete in the order that they started as expected.
Now for the problem. I can get the total time it takes for a thread to complete; what I want is to store all the times from the threads into a single array and be able to calculate the fastest time.
To do this should I place the array in the main.class file? Would I be right in assuming so because if it was placed in the Thread class it wouldn't work. Or should I create a third class?
I'm alittle confused :/
It's fine to declare it in the method where you invoke the threads, with a few notes:
each thread should know its index in the array. Perhaps you should pass this in constructor
then you have three options for filling the array
the array should be final, so that it can be used within anonymous classes
the array can be passed to each thread
the threads should notify a listener when they're done, which in turn will increment an array.
consider using Java 1.5 Executors framework for submitting Runnables, rather than working directly with threads.
EDIT: The solution below assumes you need the times only after all competitors have finished the race.
You can use a structure that looks like below, (inside your main class). Typically you want to add a lot of you own stuff; this is the main outline.
Note that concurrency is not an issue at all here because you get the value from the MyRunnable instance once its thread has finished running.
Note that using a separate thread for each competitor is probably not really necessary with a modified approach, but that would be a different issue.
public static void main(String[] args) {
MyRunnable[] runnables = new MyRunnable[NUM_THREADS];
Thread[] threads = new Thread[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; i++) {
runnables[i] = new MyRunnable();
threads[i] = new Thread(runnables[i]);
}
// start threads
for (Thread thread : threads) {
thread.start();
}
// wait for threads
for (Thread thread : threads) {
try {
thread.join();
} catch (InterruptedException e) {
// ignored
}
}
// get the times you calculated for each thread
for (int i = 0; i < NUM_THREADS; i++) {
int timeSpent = runnables[i].getTimeSpent();
// do something with the time spent
}
}
static class MyRunnable implements Runnable {
private int timeSpent;
public MyRunnable(...) {
// initialize
}
public void run() {
// whatever the thread should do
// finally set the time
timeSpent = ...;
}
public int getTimeSpent() {
return timeSpent;
}
}

Java multithreaded parser

I am writing a multithreaded parser.
Parser class is as follows.
public class Parser extends HTMLEditorKit.ParserCallback implements Runnable {
private static List<Station> itemList = Collections.synchronizedList(new ArrayList<Item>());
private boolean h2Tag = false;
private int count;
private static int threadCount = 0;
public static List<Item> parse() {
for (int i = 1; i <= 1000; i++) { //1000 of the same type of pages that need to parse
while (threadCount == 20) { //limit the number of simultaneous threads
try {
Thread.sleep(50);
} catch (InterruptedException ex) {
ex.printStackTrace();
}
}
Thread thread = new Thread(new Parser());
thread.setName(Integer.toString(i));
threadCount++; //increase the number of working threads
thread.start();
}
return itemList;
}
public void run() {
//Here is a piece of code responsible for creating links based on
//the thread name and passed as a parameter remained i,
//connection, start parsing, etc.
//In general, nothing special. Therefore, I won't paste it here.
threadCount--; //reduce the number of running threads when current stops
}
private static void addItem(Item item) {
itenList.add(item);
}
//This method retrieves the necessary information after the H2 tag is detected
#Override
public void handleText(char[] data, int pos) {
if (h2Tag) {
String itemName = new String(data).trim();
//Item - the item on which we receive information from a Web page
Item item = new Item();
item.setName(itemName);
item.setId(count);
addItem(item);
//Display information about an item in the console
System.out.println(count + " = " + itemName);
}
}
#Override
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
if (HTML.Tag.H2 == t) {
h2Tag = true;
}
}
#Override
public void handleEndTag(HTML.Tag t, int pos) {
if (HTML.Tag.H2 == t) {
h2Tag = false;
}
}
}
From another class parser runs as follows:
List<Item> list = Parser.parse();
All is good, but there is a problem. At the end of parsing in the final list "List itemList" contains 980 elements onto, instead of 1000. But in the console there is all of 1000 elements (items). That is, some threads for some reason did not call in the handleText method the addItem method.
I already tried to change the type of itemList to ArrayList, CopyOnWriteArrayList, Vector. Makes the method addItem synchronized, changed its call on the synchronized block. All this only changes the number of elements a little, but the final thousand can not be obtained.
I also tried to parse a smaller number of pages (ten). As the result the list is empty, but in the console all 10.
If I remove multi-threading, then everything works fine, but, of course, slowly. That's not good.
If decrease the number of concurrent threads, the number of items in the list is close to the desired 1000, if increase - a little distanced from 1000. That is, I think, there is a struggle for the ability to record to the list. But then why are synchronization not working?
What's the problem?
After your parse() call returns, all of your 1000 Threads have been started, but it is not guaranteed that they are finished. In fact, they aren't that's the problem you see. I would heavily recommend not write this by yourself but use the tools provided for this kind of job by the SDK.
The documentation Thread Pools and the ThreadPoolExecutor are e.g. a good starting point. Again, don't implement this yourself if you are not absolutely sure you have too, because writing such multi-threading code is pure pain.
Your code should look something like this:
ExecutorService executor = Executors.newFixedThreadPool(20);
List<Future<?>> futures = new ArrayList<Future<?>>(1000);
for (int i = 0; i < 1000; i++) {
futures.add(executor.submit(new Runnable() {...}));
}
for (Future<?> f : futures) {
f.get();
}
There is no problem with the code, it is working as you have coded. the problem is with the last iteration. rest all iterations will work properly, but during the last iteration which is from 980 to 1000, the threads are created, but the main process, does not waits for the other thread to complete, and then return the list. therefore you will be getting some odd number between 980 to 1000, if you are working with 20 threads at a time.
Now you can try adding Thread.wait(50), before returning the list, in that case your main thread will wait, some time, and may be by the time, other threads might finish the processing.
or you can use some syncronization API from java. Instead of Thread.wait(), use CountDownLatch, this will help you to wait for the threads to complete the processing, and then you can create new threads.

Categories