I have a class that needs to compute n tasks as quickly as possible (up to 625). Therefore, I want to utilize multithreading so that these computations are run in parallel. After some research, I found the fork/join framework but have not been able to figure out how to implement this.
For example, let there be some class Foo (which will be used as an object elsewhere) with some methods and variables:
public class Foo {
int n;
int[][] fooArray;
public Foo(int x) {
n = x;
fooArray = new int[n][];
}
public void fooFunction(int x, int y) {
//Assume (n > x >= 0).
fooArray[x] = new int[y];
}
//Implement multithreading here.
}
I read a basic tutorial on the Java documentation that uses ForkJoinPool to split a task into 2 parts and use recursion to pass them into the invokeAll method. Ideally, I want to do something similar except implement it as a subclass of Foo and split the task (in this case, running fooFunction) into n parts. How should I accomplish this?
After days of extensive trial-and-error, I finally figured out how to do this myself:
Let there be some class foo that needs something that needs many similar (if not identical) tasks to be done in parallel. Let there be some number n that represents the number of times that this task should be run, where n is more than zero and less than the maximum number of threads that you can create.
public class foo {
//do normal class stuff.
public void fooFunction(int n) {
//do normal function things.
executeThreads(n);
}
public void executeThreads(int n) throws InterruptedException {
ExecutorService exec = Executors.newFixedThreadPool(n);
List<Callable<Object>> tasks = new ArrayList<Callable<Object>>();
for(int i = 0; i < n; i++)
tasks.add(Executors.callable(new Task(i)));
exec.invokeAll(tasks);
exec.shutdown();
}
public class Task implements Runnable {
int taskNumber;
public Task(int i) {
taskNumber = i;
}
public void run() {
try {
//this gets run in a thread
System.out.println("Thread number " + taskNumber);
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
This is almost certainly not the most efficient method, and it creates a thread for EVERY task that needs to be done. In other words, this is NOT a thread pool. Make sure that you do not create too many threads and that the tasks are large enough to justify running them in parallel. If there are better alternatives, please post an answer.
Related
Hello I've never tried using threads before, this is my first attempt but it doesn't stop, The normal verion works.
if I remove awaitTermination it looks like it works but I need the method to finish when it's all sorted out(pun intended XD).
Can you tell me what I did wrong?
Thank you.
public class Sorting {
private Sorting() {};
private static Random r = new Random();
private static int cores = Runtime.getRuntime().availableProcessors();
private static ExecutorService executor = Executors.newFixedThreadPool(cores);
public static void qsortP(int[] a) {
qsortParallelo(a, 0, a.length - 1);
}
private static void qsortParallelo(int[] a, int first, int last) {
while (first < last) {
int p = first + r.nextInt(last - first + 1);
int px = a[p];
int i = first, j = last;
do {
while (a[i] < px)
i++;
while (a[j] > px)
j--;
if (i <= j) {
scambia(a, i++, j--);
}
} while (i <= j);
executor.submit(new qsortThread(a, first, j));
first = i;
}
try {
executor.awaitTermination(1, TimeUnit.DAYS);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
private static void scambia(int[] a, int x, int y) {
int temp = a[x];
a[x] = a[y];
a[y] = temp;
}
public static class qsortThread implements Runnable {
final int a[], first, last;
public qsortThread(int[] a, int first, int last) {
this.a = a;
this.first = first;
this.last = last;
}
public void run() {
qsortParallelo(a, first, last);
}
}
}
Instead of waiting for termination of the entire executor service (which probably isn't what you want at all), you should save all the Futures returned by executor.submit() and wait until they're all done (by calling 'get()` on them for example).
And though it's tempting to do this in the qsortParallelo() method, that would actually lead to a deadlock by exhaustion of the thread pool: parent tasks would hog the worker threads waiting for their child tasks to complete, but the child tasks would never be scheduled to run because there would be no available worker threads.
So you have to collect all the Future objects into a concurrent collection first, return the result to qsortP() and wait there for the Futures to finish.
Or use a ForkJoinPool, which was designed for exactly this kind of task and does all the donkey work for you. Recursively submitting tasks to an executor from application code is generally not a very good idea, it's very easy to get it wrong.
As an aside, the reason your code is deadlocked as it is is that every worker thread is stuck in executor.awaitTermination(), thereby preventing the termination of the executor service.
In general, the two most useful tools for designing and debugging multi-threaded applications are:
A thread dump. You can generate that with jstack, VisualVM or any other tool, but it's invaluable in deadlock situations, it gives you an accurate image of what's (not) going on with your threads.
A pen, a piece of paper and drawing a good old fashioned swimlane chart.
You are calling executor.awaitTermination inside a Thread which was launched by your executor. Thread will not stop until executor comes out of the awaitTermination and executor will not come out of awaitTermination until the Thread terminates. You need to move this code:
try {
executor.awaitTermination(1, TimeUnit.DAYS);
} catch (InterruptedException e) {
e.printStackTrace();
}
into the end of qsortP method.
The mistake in this code is simply the while-loop in qsortParallelo. first and last are never modified. Apart from that you don't need the while-loop, since you already do that the further sorting in the executor. And you'll need to start another task for the second half of the array.
I have this piece of code:
private ConcurrentLinkedQueue<Interval> intervals = new ConcurrentLinkedQueue();
#Override
public void run(){
while(!intervals.isEmpty()){
//remove one interval
//do calculations
//add some intervals
}
}
This code is being executed by a specific number of threads at the same time. As you see, loop should go on until there are no more intervals left in the collection, but there is a problem. In the beginning of each iteration an interval gets removed from collection and in the end some number of intervals might get added back into same collection.
Problem is, that while one thread is inside the loop the collection might become empty, so other threads that are trying to enter the loop won't be able to do that and will finish their work prematurely, even though collection might be filled with values after the first thread will finish the iteration. I want the thread count to remain constant (or not more than some number n) until all work is really finished.
That means that no threads are currently working in the loop and there are no elements left in the collection. What are possible ways of accomplishing that? Any ideas are welcomed.
One way to solve this problem in my specific case is to give every thread a different piece of the original collection. But after one thread would finish its work it wouldn't be used by the program anymore, even though it could help other threads with their calculations, so I don't like this solution, because it's important to utilize all cores of the machine in my problem.
This is the simplest minimal working example I could come up with. It might be to lengthy.
public class Test{
private ConcurrentLinkedQueue<Interval> intervals = new ConcurrentLinkedQueue();
private int threadNumber;
private Thread[] threads;
private double result;
public Test(int threadNumber){
intervals.add(new Interval(0, 1));
this.threadNumber = threadNumber;
threads = new Thread[threadNumber];
}
public double find(){
for(int i = 0; i < threadNumber; i++){
threads[i] = new Thread(new Finder());
threads[i].start();
}
try{
for(int i = 0; i < threadNumber; i++){
threads[i].join();
}
}
catch(InterruptedException e){
System.err.println(e);
}
return result;
}
private class Finder implements Runnable{
#Override
public void run(){
while(!intervals.isEmpty()){
Interval interval = intervals.poll();
if(interval.high - interval.low > 1e-6){
double middle = (interval.high + interval.low) / 2;
boolean something = true;
if(something){
intervals.add(new Interval(interval.low + 0.1, middle - 0.1));
intervals.add(new Interval(middle + 0.1, interval.high - 0.1));
}
else{
intervals.add(new Interval(interval.low + 0.1, interval.high - 0.1));
}
}
}
}
}
private class Interval{
double low;
double high;
public Interval(double low, double high){
this.low = low;
this.high = high;
}
}
}
What you might need to know about the program: After every iteration interval should either disappear (because it's too small), become smaller or split into two smaller intervals. Work is finished after no intervals are left. Also, I should be able to limit number of threads that are doing this work with some number n. The actual program looks for a maximum value of some function by dividing the intervals and throwing away the parts of those intervals that can't contain the maximum value using some rules, but this shouldn't really be relevant to my problem.
The CompletableFuture class is also an interesting solution for these kind of tasks.
It automatically distributes workload over a number of worker threads.
static CompletableFuture<Integer> fibonacci(int n) {
if(n < 2) return CompletableFuture.completedFuture(n);
else {
return CompletableFuture.supplyAsync(() -> {
System.out.println(Thread.currentThread());
CompletableFuture<Integer> f1 = fibonacci(n - 1);
CompletableFuture<Integer> f2 = fibonacci(n - 2);
return f1.thenCombineAsync(f2, (a, b) -> a + b);
}).thenComposeAsync(f -> f);
}
}
public static void main(String[] args) throws Exception {
int fib = fibonacci(10).get();
System.out.println(fib);
}
You can use atomic flag, i.e.:
private ConcurrentLinkedQueue<Interval> intervals = new ConcurrentLinkedQueue<>();
private AtomicBoolean inUse = new AtomicBoolean();
#Override
public void run() {
while (!intervals.isEmpty() && inUse.compareAndSet(false, true)) {
// work
inUse.set(false);
}
}
UPD
Question has been updated, so I would give you better solution. It is more "classic" solution using blocking queue;
private BlockingQueue<Interval> intervals = new ArrayBlockingQueue<Object>();
private volatile boolean finished = false;
#Override
public void run() {
try {
while (!finished) {
Interval next = intervals.take();
// put work there
// after you decide work is finished just set finished = true
intervals.put(interval); // anyway, return interval to queue
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
UPD2
Now it seems better to re-write solution and divide range to sub-ranges for each thread.
Your problem looks like a recursive one - processing one task (interval) might produce some sub-tasks (sub intervals).
For that purpose I would use ForkJoinPool and RecursiveTask:
class Interval {
...
}
class IntervalAction extends RecursiveAction {
private Interval interval;
private IntervalAction(Interval interval) {
this.interval = interval;
}
#Override
protected void compute() {
if (...) {
// we need two sub-tasks
IntervalAction sub1 = new IntervalAction(new Interval(...));
IntervalAction sub2 = new IntervalAction(new Interval(...));
sub1.fork();
sub2.fork();
sub1.join();
sub2.join();
} else if (...) {
// we need just one sub-task
IntervalAction sub3 = new IntervalAction(new Interval(...));
sub3.fork();
sub3.join();
} else {
// current task doesn't need any sub-tasks, just return
}
}
}
public static void compute(Interval initial) {
ForkJoinPool pool = new ForkJoinPool();
pool.invoke(new IntervalAction(initial));
// invoke will return when all the processing is completed
}
I had the same problem, and I tested the following solution.
In my test example I have a queue (the equivalent of your intervals) filled with integers. For the test, at each iteration one number is taken from the queue, incremented and placed back in the queue if the new value is below 7 (arbitrary). This has the same impact as your interval generation on the mechanism.
Here is an example working code (Note that I develop in java 1.8 and I use the Executor framework to handle my thread pool.) :
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.PriorityBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
public class Test {
final int numberOfThreads;
final BlockingQueue<Integer> queue;
final BlockingQueue<Integer> availableThreadsTokens;
final BlockingQueue<Integer> sleepingThreadsTokens;
final ThreadPoolExecutor executor;
public static void main(String[] args) {
final Test test = new Test(2); // arbitrary number of thread => 2
test.launch();
}
private Test(int numberOfThreads){
this.numberOfThreads = numberOfThreads;
this.queue = new PriorityBlockingQueue<Integer>();
this.availableThreadsTokens = new LinkedBlockingQueue<Integer>(numberOfThreads);
this.sleepingThreadsTokens = new LinkedBlockingQueue<Integer>(numberOfThreads);
this.executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(numberOfThreads);
}
public void launch() {
// put some elements in queue at the beginning
queue.add(1);
queue.add(2);
queue.add(3);
for(int i = 0; i < numberOfThreads; i++){
availableThreadsTokens.add(1);
}
System.out.println("Start");
boolean algorithmIsFinished = false;
while(!algorithmIsFinished){
if(sleepingThreadsTokens.size() != numberOfThreads){
try {
availableThreadsTokens.take();
} catch (final InterruptedException e) {
e.printStackTrace();
// some treatment should be put there in case of failure
break;
}
if(!queue.isEmpty()){ // Continuation condition
sleepingThreadsTokens.drainTo(availableThreadsTokens);
executor.submit(new Loop(queue.poll(), queue, availableThreadsTokens));
}
else{
sleepingThreadsTokens.add(1);
}
}
else{
algorithmIsFinished = true;
}
}
executor.shutdown();
System.out.println("Finished");
}
public static class Loop implements Runnable{
int element;
final BlockingQueue<Integer> queue;
final BlockingQueue<Integer> availableThreadsTokens;
public Loop(Integer element, BlockingQueue<Integer> queue, BlockingQueue<Integer> availableThreadsTokens){
this.element = element;
this.queue = queue;
this.availableThreadsTokens = availableThreadsTokens;
}
#Override
public void run(){
System.out.println("taking element "+element);
for(Long l = (long) 0; l < 500000000L; l++){
}
for(Long l = (long) 0; l < 500000000L; l++){
}
for(Long l = (long) 0; l < 500000000L; l++){
}
if(element < 7){
this.queue.add(element+1);
System.out.println("Inserted element"+(element + 1));
}
else{
System.out.println("no insertion");
}
this.availableThreadsTokens.offer(1);
}
}
}
I ran this code for check, and it seems to work properly. However there are certainly some improvement that can be made :
sleepingThreadsTokens do not have to be a BlockingQueue, since only the main accesses it. I used this interface because it allowed a nice sleepingThreadsTokens.drainTo(availableThreadsTokens);
I'm not sure whether queue has to be blocking or not, since only main takes from it and does not wait for elements (it waits only for tokens).
...
The idea is that the main thread checks for the termination, and for this it has to know how many threads are currently working (so that it does not prematurely stops the algorithm because the queue is empty). To do so two specific queues are created : availableThreadsTokens and sleepingThreadsTokens. Each element in availableThreadsTokens symbolizes a thread that have finished an iteration, and wait to be given another one. Each element in sleepingThreadsTokens symbolizes a thread that was available to take a new iteration, but the queue was empty, so it had no job and went to "sleep". So at each moment availableThreadsTokens.size() + sleepingThreadsTokens.size() = numberOfThreads - threadExcecutingIteration.
Note that the elements on availableThreadsTokens and sleepingThreadsTokens only symbolizes thread activity, they are not thread nor design a specific thread.
Case of termination : let suppose we have N threads (aribtrary, fixed number). The N threads are waiting for work (N tokens in availableThreadsTokens), there is only 1 remaining element in the queue and the treatment of this element won't generate any other element. Main takes the first token, finds that the queue is not empty, poll the element and sends the thread to work. The N-1 next tokens are consumed one by one, and since the queue is empty the token are moved into sleepingThreadsTokens one by one. Main knows that there is 1 thread working in the loop since there is no token in availableThreadsTokens and only N-1 in sleepingThreadsTokens, so it waits (.take()). When the thread finishes and releases the token Main consumes it, discovers that the queue is now empty and put the last token in sleepingThreadsTokens. Since all tokens are now in sleepingThreadsTokens Main knows that 1) all threads are inactive 2) the queue is empty (else the last token wouldn't have been transferred to sleepingThreadsTokens since the thread would have take the job).
Note that if the working thread finishes the treatment before all the availableThreadsTokens are moved to sleepingThreadsTokens it makes no difference.
Now if we suppose that the treatment of the last element would have generated M new elements in the queue then the Main would have put all the tokens from sleepingThreadsTokens back to availableThreadsTokens, and start to assign them treatments again. We put all the token back even if M < N because we don't know how much elements will be inserted in the future, so we have to keep all the thread available.
I would suggest a master/worker approach then.
The master process goes through the intervals and assigns the calculations of that interval to a different process. It also removes/adds as necessary. This way, all the cores are utilized, and only when all intervals are finished, the process is done. This is also known as dynamic work allocation.
A possible example:
public void run(){
while(!intervals.isEmpty()){
//remove one interval
Thread t = new Thread(new Runnable()
{
//do calculations
});
t.run();
//add some intervals
}
}
The possible solution you provided is known as static allocation, and you're correct, it will finish as fast as the slowest processor, but the dynamic approach will utilize all memory.
I've run into this problem as well. The way I solved it was to use an AtomicInteger to know what is in the queue. Before each offer() increment the integer. After each poll() decrement the integer. The CLQ has no real isEmpty() since it must look at head/tail nodes and this can change atomically (CAS).
This doesn't guarantee 100% that some thread may increment after another thread decrements so you need to check again before ending the thread. It is better than relying on while(...isEmpty())
Other than that, you may need to synchronize.
I need to do some computations/processing on a large set of ids (about 100k to 1 Million). Since the number of ids is quite large and each processing does take some time, i was thinking about implementing threads in my Java code.
Assuming we cant have 100K threads running at once, how do i implement threading in this case ?
Note - The only solution i can think of is have about 100 or more threads running where each thread would process about a 1000 or more IDs.
Use Java's built in thread pooling and executors.
ExecutorService foo = Executors.newFixedThreadPool(100);
foo.submit(new MyRunnable());
There are various thread pools you can create to tailor how many you want, if it's dynamic, etc.
Using ThreadPool:
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class ThreadIDS implements Runnable
{
public static final int totalIDS = 1000000;
int start;
int range;
public ThreadIDS(int start, int range)
{
this.start=start;
this.range=range;
}
public static void main(String[] args)
{
int availableProcessors = Runtime.getRuntime().availableProcessors();
int eachThread = totalIDS/availableProcessors + 1;
ExecutorService threads = Executors.newFixedThreadPool(availableProcessors);
for(int i = 0 ; i < availableProcessors ; i++)
{
threads.submit(new ThreadIDS(i*eachThread, eachThread));
}
while(!threads.awaitTermination(1000, TimeUnit.MILLISECONDS))System.out.println("Waiting for threads to finish");
}
public void processID(int id)
{
}
public void run()
{
for(int i = start ; i < Math.min(start+range, totalIDS) ; i++)
{
processID(i);
}
}
}
Edited the run method. Since we add 1 when dividing to avoid integer division making us miss ids, we could potentially run over the totalIDS limit. The Math.min avoids that.
If you don't want to use ThreadPools, then change the main to:
public static void main(String[] args)
{
int availableProcessors = Runtime.getRuntime().availableProcessors();
int eachThread = totalIDS/availableProcessors + 1;
for(int i = 0 ; i < availableProcessors ; i++)
{
new Thread(new ThreadIDS(i * eachThread, eachThread)).start();
}
}
Run as many threads as you have CPU cores (Runtime.getRuntime().availableProcessors()). Let each thread runs loop like this:
public void run() {
while (!ids.isEmpty()) {
Id id = ids.poll(); // exact access method depends on how your set of ids is organized
processId(id);
}
}
Comparing to using thread pool, this is simpler and requires less memory (no need to create Runnable for each id).
Splitting your work into 4 Runnables (1 per core) is probably not the best idea if there is any variation in processing time for a given ID. A better solution would be to split your work up into small chunks so that one core doesn't get stuck with all the "hard" work while the other 3 cores plow through theirs and then do nothing.
You could split your tasks into small chunks in advance and submit them to a ThreadPoolExecutor, but it might be better to use the Fork/Join framework. It's designed to handle this type of thing very efficiently.
Something like this would make sure all 4 cores stayed busy until all the work was done:
public class Test
{
public void workTest()
{
ForkJoinPool pool = new ForkJoinPool(); //Defaults to # of cores
List<ObjectThatWeProcess> work = getWork(); //Get IDs or whatever
FJAction action = new FJAction(work);
pool.invoke(action);
}
public static class FJAction extends RecursiveAction
{
private static final workSize = 1000; //Only do work if 1000 objects or less
List<ObjectThatWeProcess> work;
FJAction(List<ObjectThatWeProcess> work)
{
this.work = work;
}
public void compute()
{
if(work.size() > workSize)
{
invokeAll(new FJAction(work.subList(0,work.size()/2)),
new FJAction(work.subList(work.size()/2,work.size())));
}
else
processWork();
}
private void processWork()
{
//do something
}
}
}
You could also extend RecursiveTask<T> if the "work" returned a value that was relevant to you.
I want to have two separate threads running two different instances of different classes and I want them to execute the run command at the same time.
I've made a practice class to demonstrate the problem I'm having.
One racer counts forwards, the other counts backwards.
public class testCount {
public static void main(String args[]) {
testCount countCompetition = new testCount();
countCompetition.run();
}
public void run() {
(new Thread(new racer1())).start();
(new Thread(new racer2())).start();
}
public class racer1 implements Runnable {
public void run() {
for(int x = 0; x < 100; x++) {
System.out.println(x);
}
}
}
public class racer2 implements Runnable {
public void run() {
for(int y = 100; y > 0; y--) {
System.out.println(y);
}
}
}
}
My results
1
2
... All the way to 100
100
100
99
... All the way back down
1
What I want
1
100
2
99
3
98
They don't need to be taking turns like that, but they do need to be working at the same time, instead of one after the other.
Any hints, advice or code snippets would be greatly appreciated.
I think all the answers so far are missing the point.
Your existing logic does enable your two threads to both execute concurrently, but this is not evident because your numbers only go up to 100, and the execution will usually stay with a specific thread for more than 1 instruction at a time, otherwise there would be a large amount of overhead in switching between the currently executing thread all the time. In your case, the JVM is deciding to execute your first thread long enough for it to print out 100 numbers before "context switching" to the 2nd thread. The JVM might choose to execute the threads differently, so the result you are seeing is not guaranteed to be the same every time.
If you increase your numbers even to 1000 you will (probably) see the two threads interleaving somewhat. You will still have large runs where one thread prints out a lot of numbers in a row because it is more efficient for the JVM to execute one thread for a while before switching, instead of context switching between every instruction.
Adding Thread.sleep(1) is not a good solution as you are adding an unneccessary delay. Sure, for 100 numbers this might not be noticable but for 10000 numbers you would have a delay of 10 seconds.
Is there any reason that you would require them to interleave to a higher degree than they already do? If there is then your simple model of running two threads concurrently is not sufficient. If not then just let the JVM decide the best order to run your threads in (which in the simple example you have given, means they probably won't interleave most of the time).
Just add Thread.sleep(1); in each racer class after System.out.println().
i.e. it will look like this:
public class racer1 implements Runnable {
public void run() {
for(int x = 0; x < 100; x++) {
System.out.println(x);
try {
Thread.sleep(1);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
}
You need to write a basic wait and notify system. One task needs to notify the other that he has fished the work. Basic idea can be derived from below code. create 2 tasks, one to count forward and one to count backward
Runnable task = new Runnable() {
public void run() {
System.out.println("woohooTwo");
synchronized (t) {
while (true) {
System.out.println("---" + Thread.currentThread().getName() + "--" + t.i.getAndIncrement());
t.notifyAll();
try {
Thread.sleep(1000);
t.wait();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
};
I am writing a multithreaded parser.
Parser class is as follows.
public class Parser extends HTMLEditorKit.ParserCallback implements Runnable {
private static List<Station> itemList = Collections.synchronizedList(new ArrayList<Item>());
private boolean h2Tag = false;
private int count;
private static int threadCount = 0;
public static List<Item> parse() {
for (int i = 1; i <= 1000; i++) { //1000 of the same type of pages that need to parse
while (threadCount == 20) { //limit the number of simultaneous threads
try {
Thread.sleep(50);
} catch (InterruptedException ex) {
ex.printStackTrace();
}
}
Thread thread = new Thread(new Parser());
thread.setName(Integer.toString(i));
threadCount++; //increase the number of working threads
thread.start();
}
return itemList;
}
public void run() {
//Here is a piece of code responsible for creating links based on
//the thread name and passed as a parameter remained i,
//connection, start parsing, etc.
//In general, nothing special. Therefore, I won't paste it here.
threadCount--; //reduce the number of running threads when current stops
}
private static void addItem(Item item) {
itenList.add(item);
}
//This method retrieves the necessary information after the H2 tag is detected
#Override
public void handleText(char[] data, int pos) {
if (h2Tag) {
String itemName = new String(data).trim();
//Item - the item on which we receive information from a Web page
Item item = new Item();
item.setName(itemName);
item.setId(count);
addItem(item);
//Display information about an item in the console
System.out.println(count + " = " + itemName);
}
}
#Override
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
if (HTML.Tag.H2 == t) {
h2Tag = true;
}
}
#Override
public void handleEndTag(HTML.Tag t, int pos) {
if (HTML.Tag.H2 == t) {
h2Tag = false;
}
}
}
From another class parser runs as follows:
List<Item> list = Parser.parse();
All is good, but there is a problem. At the end of parsing in the final list "List itemList" contains 980 elements onto, instead of 1000. But in the console there is all of 1000 elements (items). That is, some threads for some reason did not call in the handleText method the addItem method.
I already tried to change the type of itemList to ArrayList, CopyOnWriteArrayList, Vector. Makes the method addItem synchronized, changed its call on the synchronized block. All this only changes the number of elements a little, but the final thousand can not be obtained.
I also tried to parse a smaller number of pages (ten). As the result the list is empty, but in the console all 10.
If I remove multi-threading, then everything works fine, but, of course, slowly. That's not good.
If decrease the number of concurrent threads, the number of items in the list is close to the desired 1000, if increase - a little distanced from 1000. That is, I think, there is a struggle for the ability to record to the list. But then why are synchronization not working?
What's the problem?
After your parse() call returns, all of your 1000 Threads have been started, but it is not guaranteed that they are finished. In fact, they aren't that's the problem you see. I would heavily recommend not write this by yourself but use the tools provided for this kind of job by the SDK.
The documentation Thread Pools and the ThreadPoolExecutor are e.g. a good starting point. Again, don't implement this yourself if you are not absolutely sure you have too, because writing such multi-threading code is pure pain.
Your code should look something like this:
ExecutorService executor = Executors.newFixedThreadPool(20);
List<Future<?>> futures = new ArrayList<Future<?>>(1000);
for (int i = 0; i < 1000; i++) {
futures.add(executor.submit(new Runnable() {...}));
}
for (Future<?> f : futures) {
f.get();
}
There is no problem with the code, it is working as you have coded. the problem is with the last iteration. rest all iterations will work properly, but during the last iteration which is from 980 to 1000, the threads are created, but the main process, does not waits for the other thread to complete, and then return the list. therefore you will be getting some odd number between 980 to 1000, if you are working with 20 threads at a time.
Now you can try adding Thread.wait(50), before returning the list, in that case your main thread will wait, some time, and may be by the time, other threads might finish the processing.
or you can use some syncronization API from java. Instead of Thread.wait(), use CountDownLatch, this will help you to wait for the threads to complete the processing, and then you can create new threads.