Java - Shutting down concurrent threads on request [duplicate] - java

This question already has answers here:
How to shutdown an ExecutorService?
(3 answers)
Closed 6 years ago.
I am trying to build a program that attempts to decrypt a file that has been encrypted using AES encryption for a school project. I have a list of ~100,000 english words, and want to implement multithreading within the program to optimise the time taken to attempt decryption with every word in the file.
I am having a problem when trying to stop the rest of the dictionary being searched in the event of the decryption completing successfully - "Attempting shutdown" is being printed to the console, but it appears that the threads continue working through the rest of the dictionary before the executor stops allocating new threads.
In my main program, the threads are run using this method:
private void startThreads(){
ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
System.out.println("Maximum threads inside pool " + executor.getMaximumPoolSize());
for (int i = 0; i < dict.size(); i++) {
String word = dict.get(i);
Grafter grafter = new Grafter("Grafter " + i,word);
grafter.registerWorkerListener(thread -> {
List results = thread.getResults();
for (Iterator iter = results.iterator(); iter.hasNext();) {
found = (boolean) iter.next();
if(found){
System.out.println("THE WORD HAS BEEN FOUND!! Attempting shutdown");
executor.shutdown();
}
}
});
// Start the worker thread
Thread thread = new Thread(grafter);
thread.start();
}
if(!executor.isShutdown()) {
executor.shutdown();
}
}
And the implementation of the 'grafter' runnable class is as follows:
public class Grafter implements Runnable{
private String NAME;
private final String WORD;
private List listeners = new ArrayList();
private List results;
public Grafter(String name, String word){
NAME = name;
WORD = word;
}
public String getName(){
return NAME;
}
#Override
public void run() {
if (tryToDecrypt(WORD) == true){
System.out.println("Thread: '" + NAME + "' successfully decrypted using word: \"" + WORD + "\".");
results = new ArrayList();
results.add(true);
// Work done, notify listeners
notifyListeners();
}else{
results = new ArrayList();
results.add(false);
// Work done, notify listeners
notifyListeners();
}
}
private void notifyListeners() {
for (Iterator iter = listeners.iterator(); iter.hasNext();) {
GrafterListener listener = (GrafterListener) iter.next();
listener.workDone(this);
}
}
public void registerWorkerListener(GrafterListener listener) {
listeners.add(listener);
}
public List getResults() {
return results;
}
private boolean tryToDecrypt(String word){
//Decryption performed, returning true if successfully decrypted,
//Returns false if not
}
}
The correct word is right at the top of the dictionary, so success is found early in the program's execution. However, there is a long pause (as the rest of the dictionary is worked through) before the program finishes.
I am looking for help on the positioning of executor.shutdown(), and how to stop the remainder of the dictionary being parsed after the decryption successfully completes.

Your main problem is that you're not actually submitting your runnables to the executor. Thus calling shutdown on the executor has no effect on all the threads you've spawned.
Instead of creating a new thread instead do something like:
executor.submit(grafter)
This should get you most of the way, but if you want the service to shut down promptly and cleanly there's a bit more you can do. The link provided in the comments by shmosel should help you with that.
As an aside, the way you're doing this isn't going to be very efficient I don't think. Essentially you're creating a new task for every word in your dictionary which means that you have a large number of tasks (100K in your case). This means that the overhead of managing and scheduling all those task is likely to be a significant portion of the work performed by your program. Instead you may want to break the list of words up into some number of sublists each containing an equal number of words and then make your runnable process that sublist only.

Related

Multi threaded program using newFixedThreadPool doesn't run as excepted when the thread pool size is less than the number of tasks executed

package com.playground.concurrency;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
public class MyRunnable implements Runnable {
private String taskName;
public String getTaskName() {
return taskName;
}
public void setTaskName(String taskName) {
this.taskName = taskName;
}
private int processed = 0;
public MyRunnable(String name) {
this.taskName = name;
}
private boolean keepRunning = true;
public boolean isKeepRunning() {
return keepRunning;
}
public void setKeepRunning(boolean keepRunning) {
this.keepRunning = keepRunning;
}
private BlockingQueue<Integer> elements = new LinkedBlockingQueue<Integer>(10);
public BlockingQueue<Integer> getElements() {
return elements;
}
public void setElements(BlockingQueue<Integer> elements) {
this.elements = elements;
}
#Override
public void run() {
while (keepRunning || !elements.isEmpty()) {
try {
Integer element = elements.take();
Thread.sleep(10);
System.out.println(taskName +" :: "+elements.size());
System.out.println("Got :: " + element);
processed++;
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
System.out.println("Exiting thread");
}
public int getProcessed() {
return processed;
}
public void setProcessed(int processed) {
this.processed = processed;
}
}
package com.playground.concurrency.service;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import com.playground.concurrency.MyRunnable;
public class TestService {
public static void main(String[] args) throws InterruptedException {
int roundRobinIndex = 0;
int noOfProcess = 10;
List<MyRunnable> processes = new ArrayList<MyRunnable>();
for (int i = 0; i < noOfProcess; i++) {
processes.add(new MyRunnable("Task : " + i));
}
ExecutorService threadPoolExecutor = Executors.newFixedThreadPool(5);
for (MyRunnable process : processes) {
threadPoolExecutor.execute(process);
}
int totalMessages = 1000;
long start = System.currentTimeMillis();
for (int i = 1; i <= totalMessages; i++) {
processes.get(roundRobinIndex++).getElements().put(i);
if (roundRobinIndex == noOfProcess) {
roundRobinIndex = 0;
}
}
System.out.println("Done putting all the elements");
for (MyRunnable process : processes) {
process.setKeepRunning(false);
}
threadPoolExecutor.shutdown();
try {
threadPoolExecutor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
long totalProcessed = 0;
for (MyRunnable process : processes) {
System.out.println("task " + process.getTaskName() + " processd " + process.getProcessed());
totalProcessed += process.getProcessed();
}
long end = System.currentTimeMillis();
System.out.println("total time" + (end - start));
}
}
I have a simple task that reads elements from a LinkedBlockingQueue. I create multiple instances of these tasks and execute by ExecutorService . This programs works as expected when the noOfProcess and thread pool size is same.(For ex: noOfProcess=10 and thread pool size=10).
However , if noOfProcess=10 and thread pool size =5 then the main thread keeps waiting at the below line after processing a few items.
processes.get(roundRobinIndex++).getElements().put(i);
What am i doing wrong here ?
Ah yes. The good old deadlock.
What happens is: You submit 10 Tasks to the ExecutorService, and then send jobs via .put(i). This blocks for Task 5 as expected when its queue is full. Now Task 5 is not currently being executed, and as a matter of fact will never be, since Task 0 to 4 are still clogging up your FixedThreadPool, blocking at .take() in the run() Method waiting for new Jobs from .put(i), which they will never get.
This error is a fundamental design flaw within your code and there are myriads of ways to fix it, one of which being the increased Thread Pool Size.
My suggestion is that you go back to the drawing board and rethink the structure in the main Method.
And since you posted your code, have some tips:
1.:
Posting your entire code can be interpreted as a call to 'pls fix my code', and you are encouraged to omit all uneccessary details (like all those getters and setters). Maybe check https://stackoverflow.com/help/minimal-reproducible-example
2.:
Posting two classes in the same body made things kinda complicated. Split it next time.
3.: (nitpick)
processes.get(roundRobinIndex++).getElements().put(i);
Combining two operations like you did here is bad style since it makes your code less readable for others. You could just have written:
processes.get(i % noOfProcesses).getElements().put(i);
To fix the behavior, you need to do one of the following:
have enough Runnables, each with enough queue capacity to take all 1,000 messages (for example: 100 Runnables with capacity 10 or more; or 10 Runnables with capacity 100 or more), or
have a thread pool that is large enough to accomodate all of your Runnables so that each of them can start running.
Without one of those happening, the ExecutorService will not start the extra Runnables. The main worker thread will continue adding items to each queue, including those of non-running Runnables, until it encounters a queue that is full, at which point it blocks. With 10 Runnables and thread pool size 5, the first queue to fill up will the be the 6th Runnable. This is the same if you had just 6 Runnables. The significant point is that you have at least one more Runnable than you have room in your thread pool.
From newFixedThreadPool() Javadoc:
If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available.
Consider a simpler example of 2 processes and thread pool size of 1. You'll be allowed to create the first process and submit it to the ExecutorService (so the ExecutorService will start and run it). The second process however, will not be allowed to run by the ExecutorService. Your main thread does not pay attention to this, however, and it will continue putting elements into the queue for the second process even though nothing is consuming it.
Your code is ok with noOfProcess=10 and thread pool size=5 – if you also change your queue size to 100, like this: new LinkedBlockingQueue<>(100).
You can observe this behavior – where the queue of a non-running Runnable fills up – if you change this line:
processes.get(roundRobinIndex++).getElements().put(i);
to this (which is the same logical code, but has object references saved for use inside the println() output):
MyRunnable runnable = processes.get(roundRobinIndex++);
BlockingQueue<Integer> elements = runnable.getElements();
System.out.println("attempt to put() for " + runnable.getTaskName() + " with " + elements.size() + " elements");
elements.put(i);

ForkJoinPool - Why program is throwing OutOfMemoryError?

I wanted to try out ForkJoinPool in Java 8 so i wrote a small program for searching all the files whose name contains a specific keyword in a given directory.
Program:
public class DirectoryService {
public static void main(String[] args) {
FileSearchRecursiveTask task = new FileSearchRecursiveTask("./DIR");
ForkJoinPool pool = (ForkJoinPool) Executors.newWorkStealingPool();
List<String> files = pool.invoke(task);
pool.shutdown();
System.out.println("Total no of files with hello" + files.size());
}
}
class FileSearchRecursiveTask extends RecursiveTask<List<String>> {
private String path;
public FileSearchRecursiveTask(String path) {
this.path = path;
}
#Override
protected List<String> compute() {
File mainDirectory = new File(path);
List<String> filetedFileList = new ArrayList<>();
List<FileSearchRecursiveTask> recursiveTasks = new ArrayList<>();
if(mainDirectory.isDirectory()) {
System.out.println(Thread.currentThread() + " - Directory is " + mainDirectory.getName());
if(mainDirectory.canRead()) {
File[] fileList = mainDirectory.listFiles();
for(File file : fileList) {
System.out.println(Thread.currentThread() + "Looking into:" + file.getAbsolutePath());
if(file.isDirectory()) {
FileSearchRecursiveTask task = new FileSearchRecursiveTask(file.getAbsolutePath());
recursiveTasks.add(task);
task.fork();
} else {
if (file.getName().contains("hello")) {
System.out.println(file.getName());
filetedFileList.add(file.getName());
}
}
}
}
for(FileSearchRecursiveTask task : recursiveTasks) {
filetedFileList.addAll(task.join());
}
}
return filetedFileList;
}
}
This program works fine when directory doesn't have too many sub-directories and files but if its really big then it throws OutOfMemoryError.
My understanding is that max number of threads (including compensation threads) are bounded so why their is this error? Am i missing anything in my program?
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1486)
at java.util.concurrent.ForkJoinPool.tryCompensate(ForkJoinPool.java:2020)
at java.util.concurrent.ForkJoinPool.awaitJoin(ForkJoinPool.java:2057)
at java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:390)
at java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:719)
at FileSearchRecursiveTask.compute(DirectoryService.java:51)
at FileSearchRecursiveTask.compute(DirectoryService.java:20)
at java.util.concurrent.RecursiveTask.exec(RecursiveTask.java:94)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.tryRemoveAndExec(ForkJoinPool.java:1107)
at java.util.concurrent.ForkJoinPool.awaitJoin(ForkJoinPool.java:2046)
at java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:390)
at java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:719)
at FileSearchRecursiveTask.compute(DirectoryService.java:51)
at FileSearchRecursiveTask.compute(DirectoryService.java:20)
at java.util.concurrent.RecursiveTask.exec(RecursiveTask.java:94)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
You should not fork new tasks beyond all recognition. Basically, you should fork as long as there’s a chance that another worker thread can pick up the forked job and evaluate locally otherwise. Then, once you have forked a task, don’t call join() right afterwards. While the underlying framework will start compensation threads to ensure that your jobs will proceed instead of just having all threads blocked waiting for a sub-task, this will create that large amount of threads that may exceed the system’s capabilities.
Here is a revised version of your code:
public class DirectoryService {
public static void main(String[] args) {
FileSearchRecursiveTask task = new FileSearchRecursiveTask(new File("./DIR"));
List<String> files = task.invoke();
System.out.println("Total no of files with hello " + files.size());
}
}
class FileSearchRecursiveTask extends RecursiveTask<List<String>> {
private static final int TARGET_SURPLUS = 3;
private File path;
public FileSearchRecursiveTask(File file) {
this.path = file;
}
#Override
protected List<String> compute() {
File directory = path;
if(directory.isDirectory() && directory.canRead()) {
System.out.println(Thread.currentThread() + " - Directory is " + directory.getName());
return scan(directory);
}
return Collections.emptyList();
}
private List<String> scan(File directory)
{
File[] fileList = directory.listFiles();
if(fileList == null || fileList.length == 0) return Collections.emptyList();
List<FileSearchRecursiveTask> recursiveTasks = new ArrayList<>();
List<String> filteredFileList = new ArrayList<>();
for(File file: fileList) {
System.out.println(Thread.currentThread() + "Looking into:" + file.getAbsolutePath());
if(file.isDirectory())
{
if(getSurplusQueuedTaskCount() < TARGET_SURPLUS)
{
FileSearchRecursiveTask task = new FileSearchRecursiveTask(file);
recursiveTasks.add(task);
task.fork();
}
else filteredFileList.addAll(scan(file));
}
else if(file.getName().contains("hello")) {
filteredFileList.add(file.getAbsolutePath());
}
}
for(int ix = recursiveTasks.size() - 1; ix >= 0; ix--) {
FileSearchRecursiveTask task = recursiveTasks.get(ix);
if(task.tryUnfork()) task.complete(scan(task.path));
}
for(FileSearchRecursiveTask task: recursiveTasks) {
filteredFileList.addAll(task.join());
}
return filteredFileList;
}
}
The method doing the processing has been factored out into a method receiving the directory as parameter, so we are able to use it locally for arbitrary directories not necessarily being associated with a FileSearchRecursiveTask instance.
Then, the method uses getSurplusQueuedTaskCount() to determine the number of locally enqueued tasks which have not been picked up by other worker threads. Ensuring that there are some helps work balancing. But if this number exceeds the threshold, the processing will be done locally without forking more jobs.
After the local processing, it iterates over the tasks and uses tryUnfork() to identify jobs which have not been stolen by other worker threads and process them locally. Iterating backwards to start this with the youngest jobs raises the chances to find some.
Only afterwards, it join()s with all sub-jobs which are now either, completed or currently processed by another worker thread.
Note that I changed the initiating code to use the default pool. This uses “number of CPU cores” minus one worker threads, plus the initiating thread, i.e. the main thread in this example.
Just a minor change is required.
You need to specify the parallelism for newWorkStealingPool as follows:
ForkJoinPool pool = (ForkJoinPool) Executors.newWorkStealingPool(5);
As per its documentation:
newWorkStealingPool(int parallelism) -> Creates a thread pool that maintains enough threads to support the given parallelism level, and may use multiple queues to reduce contention. The parallelism level corresponds to the maximum number of threads actively engaged in, or available to engage in, task processing. The actual number of threads may grow and shrink dynamically. A work-stealing pool makes no guarantees about the order in which submitted tasks are executed.
As per the attached Java Visual VM screenshot, this parallelism allows the program to work within the memory specified and never goes out of memory.
And, one more thing (not sure if it will make any effect):
Change the order in which fork is called and the task is added to list. That is, change
FileSearchRecursiveTask task = new FileSearchRecursiveTask(file.getAbsolutePath());
recursiveTasks.add(task);
task.fork();
to
FileSearchRecursiveTask task = new FileSearchRecursiveTask(file.getAbsolutePath());
task.fork();
recursiveTasks.add(task);

rescheduling created thread and stop it after fixed delay

I am facing one problem regarding rescheduling existing thread.Basically I want to create a new thread with some identifier in a function call and stop it after some specified period of time lets say 5 min.now two case will happen
if same function called again with same identifier within 5 min in that case thread have to reschedule its stop time again for 5 more min from current time.
if same function called again with same identifier after 5 min in that case new thread will create which will stop after 5 min.
How can I implement this?
This is close to being too broad, but here is a simple recipe how to approach such problems:
you create a holder for a timestamp which can be read and written in a thread-safe manner
when that new thread starts, it simply writes the current time into that holder
when the function is called while the thread is still running, that "other" code simply updates the timestamp to the current time
that thread should be checking the timestamp regularly - and when the thread finds "now >= timestamp + 5 min) - you know that the 5 minutes are over and the thread can stop.
i got my answer.what i have done is basically created a map which can store string as key and ScheduledFuture object as value. i used executor.schedule() method to create thread with delay of 5 min. within update method thread will be checked in map whether it exist or not. And if not available in that case method will create new thread with 5 min of delay and add it to betamap object . if update method called within 5 minutes with same thread name in that case thread will be canceled by calling t.cancel(false) and create a new thread with 5 min of delay and store it in betamap. After 5 minutes run method will be called and thread will stop automatically.but in run method i need to delete current executing thread from map.
I don't know the way i explained is understood by you guys or not but i will put code below which may be helpful to understand.
code what i have done
public class BetaUserServiceImpl{
public static ScheduledThreadPoolExecutor executor = new ScheduledThreadPoolExecutor(15);
static ScheduledFuture<?> t,t1;
public static ConcurrentHashMap<String, ScheduledFuture<?>> betaMap = new ConcurrentHashMap<>();
int timeDelay = 5;
public void update(String threadname)
{
synchronized (betaMap) {
if (betaMap.containsKey(threadname))
{
t = betaMap.get(threadname);
t.cancel(false);
Iterator it = betaMap.entrySet().iterator();
while (it.hasNext())
{
Entry item = (Entry) it.next();
if(item.getKey().equals(threadname))
{
betaMap.remove(item.getKey());
}
}
t1=executor.schedule(new BetaUserThreadSchedular("thread1"),
timeDelay,TimeUnit.SECONDS);
betaMap.put(threadname, t1);
}
else
{
t=executor.schedule(new BetaUserThreadSchedular(threadname),timeDelay,TimeUnit.SECONDS);
betaMap.put(threadname, t);
}
}
}
}
Thread class
public class BetaUserThreadSchedular implements Runnable {
private Thread thread;
private String deviceId;
public BetaUserThreadSchedular(String deviceId) {
this.deviceId = deviceId;
}
public void run() {
synchronized (BetaUserServiceImpl.betaMap) {
try {
Iterator it = BetaUserServiceImpl.betaMap.entrySet().iterator();
while (it.hasNext())
{
Entry item = (Entry) it.next();
if(item.getKey().equals(deviceId) )
{
BetaUserServiceImpl.betaMap.remove(item.getKey());
}
}
}catch (Exception e) {
logger.info(e);
}
}
}
public void start() {
thread = new Thread(this);
thread.start();
}
}

Multithreading arraylist of objects

My program has an arraylist of websites which I do I/O with image processing, scrape data from sites and update/insert into database. Right now it is slow because all of the I/O being done. I would like to speed this up by allowing my program to run with threads. Nothing is ever removed from the list and every website in the list is separate from each other so to me it seems okay to have instances looping through the list at the same time to speed this up.
Let's say my list is 10 websites, right now of course it's looping through position 0 through 9 until my program is done processing for all websites.
And let's say I want to have 3 threads looping through this list of 10 websites at once doing all the I/O and database updates in their own separate space at the same time but using the same list.
website.get(0) // thread1
website.get(1) // thread2
website.get(2) // thread3
Then say if thread2 reaches the end of the loop it first it comes back and works on the next position
website.get(3) // thread2
Then thread3 completes and gets the next position
website.get(4) // thread3
and then thread1 finally completes and works on the next position
website.get(5) // thread1
etc until it's done. Is this easy to set up? Is there somewhere I can find a good example of it being done? I've looked online to try to find somewhere else talking about my scenario but I haven't found it.
In my app, I use ExecutorService like this, and it works well:
Main code:
ExecutorService pool = Executors.newFixedThreadPool(3); //number of concurrent threads
for (String name : website) { //Your ArrayList
pool.submit(new DownloadTask(name, toPath));
}
pool.shutdown();
pool.awaitTermination(5, TimeUnit.SECONDS); //Wait for all the threads to finish, adjust as needed.
The actual class where you do the work:
private static class DownloadTask implements Runnable {
private String name;
private final String toPath;
public DownloadTask(String name, String toPath) {
this.name = name;
this.toPath = toPath;
}
#Override
public void run() {
//Do your parsing / downloading / etc. here.
}
}
Some cautions:
If you are using a database, you have to ensure that you don't have two threads writing to that database at the same time.
See here for more info.
As mentioned in other comments/answer you just need a thread pool executor with fixed size (say 3 as per your example) which runs 3 threads which iterate over the same list without picking up duplicate websites.
So apart from thread pool executor, you probably need to also need to correctly work out the next index in each thread to pick the element from that list in such a way that thread does not pick up same element from list and also not miss any element.
Hence i think you can use BlockingQueue instead of list which eliminates the index calculation part and guarantees that the element is correctly picked from the collection.
public class WebsitesHandler {
public static void main(String[] args) {
BlockingQueue<Object> websites = new LinkedBlockingQueue<>();
ExecutorService executorService = Executors.newFixedThreadPool(3);
Worker[] workers = new Worker[3];
for (int i = 0; i < workers.length; i++) {
workers[i] = new Worker(websites);
}
try {
executorService.invokeAll(Arrays.asList(workers));
} catch (InterruptedException e) {
e.printStackTrace();
}
}
private static class Worker implements Callable {
private BlockingQueue<Object> websites;
public Worker(BlockingQueue<Object> websites) {
this.websites = websites;
}
public String call() {
try {
Object website;
while ((website = websites.poll(1, TimeUnit.SECONDS)) != null) {
// execute the task
}
} catch (InterruptedException e) {
e.printStackTrace();
}
return "done";
}
}
}
I think you need to update yourself with latest version of java i.e Java8
And study about Streams API,That will definitely solve your problem

Java multithreaded parser

I am writing a multithreaded parser.
Parser class is as follows.
public class Parser extends HTMLEditorKit.ParserCallback implements Runnable {
private static List<Station> itemList = Collections.synchronizedList(new ArrayList<Item>());
private boolean h2Tag = false;
private int count;
private static int threadCount = 0;
public static List<Item> parse() {
for (int i = 1; i <= 1000; i++) { //1000 of the same type of pages that need to parse
while (threadCount == 20) { //limit the number of simultaneous threads
try {
Thread.sleep(50);
} catch (InterruptedException ex) {
ex.printStackTrace();
}
}
Thread thread = new Thread(new Parser());
thread.setName(Integer.toString(i));
threadCount++; //increase the number of working threads
thread.start();
}
return itemList;
}
public void run() {
//Here is a piece of code responsible for creating links based on
//the thread name and passed as a parameter remained i,
//connection, start parsing, etc.
//In general, nothing special. Therefore, I won't paste it here.
threadCount--; //reduce the number of running threads when current stops
}
private static void addItem(Item item) {
itenList.add(item);
}
//This method retrieves the necessary information after the H2 tag is detected
#Override
public void handleText(char[] data, int pos) {
if (h2Tag) {
String itemName = new String(data).trim();
//Item - the item on which we receive information from a Web page
Item item = new Item();
item.setName(itemName);
item.setId(count);
addItem(item);
//Display information about an item in the console
System.out.println(count + " = " + itemName);
}
}
#Override
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
if (HTML.Tag.H2 == t) {
h2Tag = true;
}
}
#Override
public void handleEndTag(HTML.Tag t, int pos) {
if (HTML.Tag.H2 == t) {
h2Tag = false;
}
}
}
From another class parser runs as follows:
List<Item> list = Parser.parse();
All is good, but there is a problem. At the end of parsing in the final list "List itemList" contains 980 elements onto, instead of 1000. But in the console there is all of 1000 elements (items). That is, some threads for some reason did not call in the handleText method the addItem method.
I already tried to change the type of itemList to ArrayList, CopyOnWriteArrayList, Vector. Makes the method addItem synchronized, changed its call on the synchronized block. All this only changes the number of elements a little, but the final thousand can not be obtained.
I also tried to parse a smaller number of pages (ten). As the result the list is empty, but in the console all 10.
If I remove multi-threading, then everything works fine, but, of course, slowly. That's not good.
If decrease the number of concurrent threads, the number of items in the list is close to the desired 1000, if increase - a little distanced from 1000. That is, I think, there is a struggle for the ability to record to the list. But then why are synchronization not working?
What's the problem?
After your parse() call returns, all of your 1000 Threads have been started, but it is not guaranteed that they are finished. In fact, they aren't that's the problem you see. I would heavily recommend not write this by yourself but use the tools provided for this kind of job by the SDK.
The documentation Thread Pools and the ThreadPoolExecutor are e.g. a good starting point. Again, don't implement this yourself if you are not absolutely sure you have too, because writing such multi-threading code is pure pain.
Your code should look something like this:
ExecutorService executor = Executors.newFixedThreadPool(20);
List<Future<?>> futures = new ArrayList<Future<?>>(1000);
for (int i = 0; i < 1000; i++) {
futures.add(executor.submit(new Runnable() {...}));
}
for (Future<?> f : futures) {
f.get();
}
There is no problem with the code, it is working as you have coded. the problem is with the last iteration. rest all iterations will work properly, but during the last iteration which is from 980 to 1000, the threads are created, but the main process, does not waits for the other thread to complete, and then return the list. therefore you will be getting some odd number between 980 to 1000, if you are working with 20 threads at a time.
Now you can try adding Thread.wait(50), before returning the list, in that case your main thread will wait, some time, and may be by the time, other threads might finish the processing.
or you can use some syncronization API from java. Instead of Thread.wait(), use CountDownLatch, this will help you to wait for the threads to complete the processing, and then you can create new threads.

Categories