Java: sharing workers in thread pool for several recursive tasks - java

There is one fixed thread pool (let it be with size=100), that I want to use for all tasks across my app.
It is used to limit server load.
Task = web crawler, that submits first job to thread pool.
That job can generate more jobs, and so on.
One job = one HTTP I/O request.
Problem
Suppose that there is only one executing task, that generated 10000 jobs.
Those jobs are now queued in thread pool queue, and all 100 threads are used for their execution.
Suppose that I now submit a second task.
The first job of the second task is 10001th in the queue.
It will be executed only after the 10000 jobs that the first task queued up.
So, this is a problem - I don't want the second task to wait so long to start its first job.
Idea
The first idea on my mind is to create a custom BlockingQueue and pass it to the thread pool constructor.
That queue will hold several blocking queues, one for each task.
Its take method will then choose a random queue and take an item from it.
My problem with this is that I don't see how to remove an empty queue from this list when its task is finished. This would mean some or all workers could get blocked on the take method, waiting for jobs from tasks that are finished.
Is this the best way to solve this problem?
I was unable to find any patterns for it in books or on the Internet :(
Thank you!

I would use multiple queues and draw from a random of the queues that contains items. Alternatively you could prioritize which queue should get the highest priority.

I would suggest using a single PriorityBlockingQueue and using the 'depth' of the recursive tasks to compute the priority. With a single queue, workers get blocked when the queue is empty and there is no need for randomization logic around the multiple queues.

Related

Executor pool limit number of threads at a time

I have a situation in which I have to run some 10,000 threads. Obviously, one machine cannot run these many threads in parallel. Is there any way by which we can ask Thread pool to run some specific number of threads in the beginning and as soon as one thread finishes, the threads which are left can start their processing ?
Executors.newFixedThreadPool(nThreads) is what most likely you are looking for. There will only be as many threads running at one time as the number of threads specified. And yes one machine cannot run 10,000 threads at once in parallel, but it will be able to run them concurrently. Depending on how resource intensive each thread is, it may be more efficient in your case to use
Executors.newCachedThreadPool() wherein as many threads are created as needed, and threads that have finished are reused.
Using Executors.newFixedThreadPool(10000) with invokeAll will throw an OutOfMemory exception with that many threads. You still could use it by submitting tasks to it instead of invoking all tasks at same time, that's I would say safer than just invokeAll.
For this use case. You can have a ThreadPollExecuter with Blocking Queue. http://howtodoinjava.com/core-java/multi-threading/how-to-use-blockingqueue-and-threadpoolexecutor-in-java/ this tutorial explains that very well.
It sounds like you want to run 10,000 tasks on a group of threads. A relatively simple approach is to create a List and then add all the tasks to the list, wrapping them in Runnable. Then, create a class that takes the list in the constructor and pops a Runnable of the list and then runs it. This activity must be synchronized in some manner. The class exits when the list is empty. Start some number of threads using this class. They'll burn down the list and then stop. Your main thread can monitor the length of the list.

Manage Queue System in multiple Threads in java

I have a Queue of request. There are two threads. In on thread i am adding the items to queue and second thread basically get the requests from queue list and execute them. So second thread wait for 1st thread to put some request in the list. I am doing so in a while loop. I don't think this is a best way to do it. It is CPU intensive. I can think of a way to notify the 2nd thread whenever I add a request. but there can be problem that the request may not execute successfully so I have to ask 2nd thread again to execute the request.
so is there any way you can think will work ?
Use one of the available blocking queues in Java: http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html
The busy waiting is indeed not recommended (unless you want to use your computer for heating).
You can make use of Semaphores to solve this problem.
The second thread, which is the worker thread will wait on the semaphore. Every time the 1st thread pushes new task info onto the Queue structure, it will also post to the Semaphore so now the second thread can safely go and execute.
This may also need some synchronization along the way if there are multiple reader/writer threads.

Determine number of completed tasks in ExecutorCompletionService queue

I'm working on a project where there is a large input of data elements that need to be processed. The processing of each is independent of the others and I need a return a result from each. What I'm doing now is creating a Callable task for each element to do the processing and using ExecutorCompletionService to collect the Future result as the threads complete.
I then have another thread that is pulling the Future objects from the ExecutorCompletionService queue. This thread just spins in an infinite while loop and calls take() which blocks until a Future shows up in the queue.
What I'm trying to do is avoid the scenario where the queue of Future objects grows faster than I pull them off the queue so I'd like to sleep the process that's creating tasks if I get behind on processing the Future results.
The problem I'm running into is that I'm not able to find a way to see how many Future objects are in the ExecutorCompletionService queue. Is there a way to do this?
I could probably keep an external counter that I increment when a new task is created and decrement when a Future is processed but this only gets me to the number of outstanding tasks, not the number that are actually done. Any thoughts on the best way to tackle this?
You can pass the queues an executor uses using one of the overloaded constructor. Since queue implements Collection you could just call .size() on that queue. You will have a queue for the completion and another queue for the executor that the ExecutorCompletionService uses so you could tell how many are submitted and how many are completed between those two.
You'll just need to hold on to those queues after you create them and pass it to whatever is watching the size of them.
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorCompletionService.html shows the overloaded constructor

ScheduledThreadPoolExecutor - single task/thread running often or multiple threads running less often

I'm trying to set up a job that will run every x minutes/seconds/milliseconds/whatever and poll an Amazon SQS queue for messages to process. My question is what the best approach would be for this. Should I create a ScheduledThreadPoolExecutor with x number of threads and schedule a single task with scheduleAtFixedRate method and just run it very often (like 10 ms) so that multiple threads will be used when needed, or, as I am proposing to colleagues, create a ScheduledThreadPoolExecutor with x number of threads and then create multiple scheduled tasks at slightly offset intervals but running less often. This to me sounds like how the STPE was meant to be used.
Typically I use Spring/Quartz for this type of thing but that's out of at this point.
So what are your thoughts?
I recommend that you use long polling on SQS, which makes your ReceiveMessage calls behave more like calls to take on a BlockingQueue (which means that you won't need to use a scheduled task to poll from the queue - you just need a single thread that polls in an infinite loop, retrying if the connection times out)
Well it depends on the frequency of tasks. If you just have to poll on timely interval and the interval is not very small, then ScheduledThreadPoolExecutor with scheduleAtFixedRate is a good alternative.
Else I will recommend using netty's HashedWheelTimer. Under heavy tasks it gives the best performance. Akka and play uses this for scheduling. This is because STPE for every task adding takes O(log(n)) where as HWT takes O(1).
If you have to use STPE, I will recommend one task at a rate else it results in excess resource.
Long Polling is like a blocking queue only for a max of 20 seconds after which the call returns. Long polling is sufficient if that is the max delay required between poll cycles. Beyond that you will need a scheduledExector.
The number of threads really depends on how fast you can process the received messages. If you can process the message really fast you need only a single thread. I have a setup as follows
SingleThreadScheduledExecutor with scheduleWithFixedDelay executes 5 mins after the previous completion
In each execution messages are retrieved in batch from SQS till there are no more messages to process (remember each batch receive a max of 10 messages).
The messages are processed and then deleted from queue.
For my scenario single thread is sufficient. If the backlog is increasing (for example, a network operation is required for each message which may involve waits), you might want to use multiple threads. If one processing node become resource constrained you could always start another instance (EC2 perhaps) to add more capacity.

Any available design pattern for a thread that is capable of executing a specific job sent by another threads?

I'm working on a project where execution time is critical. In one of the algorithms I have, I need to save some data into a database.
What I did is call a method that does that. It fires a new thread every time it's called. I faced a runoutofmemory problem since the loaded threads are more than 20,000 ...
My question now is, I want to start only one thread, when the method is called, it adds the job into a queue and notifies the thread, it sleeps when no jobs are available and so on. Any design patterns available or examples available online ?
Run, do not walk to your friendly Javadocs and look up ExecutorService, especially Executors.newSingleThreadExecutor().
ExecutorService myXS = Executors.newSingleThreadExecutor();
// then, as needed...
myXS.submit(myRunnable);
And it will handle the rest.
Yes, you want a worker thread or thread pool pattern.
http://en.wikipedia.org/wiki/Thread_pool_pattern
See http://www.ibm.com/developerworks/library/j-jtp0730/index.html for Java examples
I believe the pattern you're looking for is called producer-consumer. In Java, you can use the blocking methods on a BlockingQueue to pass tasks from the producers (that create the jobs) to the consumer (the single worker thread). This will make the worker thread automatically sleep when no jobs are available in the queue, and wake up when one is added. The concurrent collections should also handle using multiple worker threads.
Are you looking for java.util.concurrent.Executor?
That said, if you have 20000 concurrent inserts into the database, using a thread pool will probably not save you: If the database can't keep up, the queue will get longer and longer, until you run out of memory again. Also, note that an executors queue is volatile, i.e. if the server crashes, the data in it will be gone.

Categories