Concurrent and scalable data structure in Java to handle tasks?

Concurrent and scalable data structure in Java to handle tasks? - java

for my current development I have many threads (Producers) that create Tasks and many threads that that consume these Tasks (consumers)
Each Producers is identified by a unique name; A Tasks is made of:
the name of its Producers
a name
data
My question concerns the data structure used by the (Producers) and the (consumers).
Concurrent Queue?
Naively, we could imagine that Producers populate a concurrent-queue with Tasks and (consumers) reads/consumes the Tasks stored in the concurrent-queue.
I think that this solution would rather well scale but one single case is problematic: If a Producers creates very quickly two Tasks having the same name but not the same data (Both tasks T1 and T2 have the same name but T1 has data D1 and T2 has data D2), it is theoretically possible that they are consumed in the order T2 then T1!
Task Map + Queue?
Now, I imagine creating my own data structure (let's say MyQueue) based on Map + Queue. Such as a queue, it would have a pop() and a push() method.
The pop() method would be quite simple
The push() method would:
Check if an existing Task is not yet inserted in MyQueue (doing find() in the Map)
if found: data stored in the Task to-be-inserted would be merged with data stored in the found Task
if not found: the Task would be inserted in the Map and an entry would be added in the Queue
Of course, I'll have to make it safe for concurrent access... and that will certainly be my problem; I am almost sure that this solution won't scale.
So What?
So my question is now what are the best data structure I have to use in order to fulfill my requirements

You could try Heinz Kabutz's Striped Executor Service a possible candidate.
This magical thread pool would ensure that all Runnables with the same stripeClass would be executed in the order they were submitted, but StripedRunners with different stripedClasses could still execute independently.

Instead of making a data structure safe for concurrent access, why not opting out concurrent and go for parallel?
Functional programming models such as MapReduce are a very scalable way to solve this kind of problems.
I understand that D1 and D2 can be either analyzed together or in isolation and the only constraint is that they shouldn't be analyzed in the wrong order. (Making some assumption here ) But in case the real problem is only the way the results are combined, there might be an easy solution.
You could remove the constraint all together allowing them to be analyzed separately and then having a reduce function that is able to re-combine them together in a sensible way.
In this case you'd have the first step as map and the second as reduce.
Even if the computation is more efficient if done in a single go, a big part of scaling, especially scaling out is accomplished by denormalization.

If consumers are running in parallel, I doubt there is a way to make them execute tasks with the same name sequentially.
In your example (from comments):
BlockingQueue can really be a problem (unfortunately) if a Producer
"P1" adds a first task "T" with data D1 and quickly a second task "T"
with data D2. In this case, the first task can be handled by a thread
and the second task by another thread; If the threads handling the
first task is interrupted, the thread handling the second one can
complete first
There is no difference if P1 submits D2 not so quickly. Consumer1 could still be too slow, so consumer 2 would be able to finish first. Here is an example for such scenario:
P1: submit D1
C1: read D1
P2: submit D2
C2: read D2
C2: process D2
C1: process D1
To solve it, you will have to introduce some kind of completion detection, which I believe will overcomplicate things.
If you have enough load and can process some tasks with different names not sequentially, then you can use a queue per consumer and put same named tasks to the same queue.
public class ParallelQueue {
private final BlockingQueue<Task>[] queues;
private final int consumersCount;
public ParallelQueue(int consumersCount) {
this.consumersCount = consumersCount;
queues = new BlockingQueue[consumersCount];
for (int i = 0; i < consumersCount; i++) {
queues[i] = new LinkedBlockingQueue<>();
}
}
public void push(Task<?> task) {
int index = task.name.hashCode() % consumersCount;
queues[index].add(task);
}
public Task<?> pop(int consumerId) throws InterruptedException {
int index = consumerId % consumersCount;
return queues[index].take();
}
private final static class Task<T> {
private final String name;
private final T data;
private Task(String name, T data) {
this.name = name;
this.data = data;
}
}
}

Related

Processing sub-streams of a stream in Java using executors

I have a program that processes a huge stream (not in the sense of java.util.stream, but rather InputStream) of data coming in through the network. The stream consists of objects, each having a sort of sub-stream identifier. Right now the whole processing is done in a single thread, but it takes a lot of CPU time and each sub-stream can easily be processed independently, so I'm thinking of multi-threading it.
However, each sub-stream requires to keep a lot of bulky state, including various buffers, hash maps and such. There is no particular reason to make it concurrent or synchronized since sub-streams are independent of each other. Moreover, each sub-stream requires that its objects are processed in the order they arrive, which means that probably there should be a single thread for each sub-stream (but possibly one thread processing multiple sub-streams).
I'm thinking of several approaches to this, but they are not quite elegant.
Create a single ThreadPoolExecutor for all tasks. Each task will contain the next object to process and the reference to a Processor instance which keeps all the state. That would ensure the necessary happens-before relationship thus ensuring that the processing thread will see the up-to-date state for this sub-stream. This approach has no way to make sure that the next object of the same sub-stream will be processed in the same thread, as far as I can see. Moreover, it needs some guarantee that objects will be processed in the order they come in, which will require additional synchronization of the Processor objects, introducing unnecessary delays.
Create multiple single-thread executors manually and a sort of hash-map that maps sub-stream identifiers to executor. This approach requires manual management of executors, creating or shutting down them as new sub-streams begin or end, and distributing the tasks between them accordingly.
Create a custom executor that processes a special subclass of tasks each having a sub-stream ID. This executor would use it as a hint to use the same thread for executing this task as the previous one with the same ID. However, I don't see an easy way to implement such executor. Unfortunately, it doesn't seem possible to extend any of the existing executor classes, and implementing an executor from scratch is kind of overkill.
Create a single ThreadPoolExecutor, but instead of creating a task for each incoming object, create a single long-running task for each sub-stream that would block in a concurrent queue, waiting for the next object. Then put objects in queues according to their sub-stream IDs. This approach needs as many threads as there are sub-streams because the tasks will be blocked. The expected number of sub-streams is about 30-60, so that may be acceptable.
Alternatively, proceed as in 4, but limit the number of threads, assigning multiple sub-streams to a single task. This is sort of a hybrid between 2 and 4. As far as I can see, this is the best approach of these, but it still requires some sort of manual sub-stream distribution between tasks and some way to shut the extra tasks down as sub-streams end.
What would be the best way to ensure that each sub-stream is processed in its own thread without a lot of error-prone code? So that the following pseudo-code will work:
// loop {
Item next = stream.read();
int id = next.getSubstreamID();
Processor processor = getProcessor(id);
SubstreamTask task = new SubstreamTask(processor, next, id);
executor.submit(task); // This makes sure that the task will
// be executed in the same thread as the
// previous task with the same ID.
// } // loop

I suggest having an array of single threaded executors. If you can devise a consistent hashing strategy for sub-streams, you can map sub-streams to individual threads. e.g.
final ExecutorsService[] es = ...
public void submit(int id, Runnable run) {
es[(id & 0x7FFFFFFF) % es.length].submit(run);
}
The key could be an String or long but some way to identify the sub-stream. If you know a particular sub-stream is very expensive, you could assign it a dedicated thread.

The solution I finally chose looks like this:
private final Executor[] streamThreads
= new Executor[Runtime.getRuntime().availableProcessors()];
{
for (int i = 0; i < streamThreads.length; ++i) {
streamThreads[i] = Executors.newSingleThreadExecutor();
}
}
private final ConcurrentHashMap<SubstreamId, Integer>
threadById = new ConcurrentHashMap<>();
This code determines which executor to use:
Message msg = in.readNext();
SubstreamId msgSubstream = msg.getSubstreamId();
int exe = threadById.computeIfAbsent(msgSubstream,
id -> findBestExecutor());
streamThreads[exe].execute(() -> {
// processing goes here
});
And the findBestExecutor() function is this:
private int findBestExecutor() {
// Thread index -> substream count mapping:
final int[] loads = new int[streamThreads.length];
for (int thread : threadById.values()) {
++loads[thread];
}
// return the index of the minimum load
return IntStream.range(0, streamThreads.length)
.reduce((i, j) -> loads[i] <= loads[j] ? i : j)
.orElse(0);
}
This is, of course, not very efficient, but note that this function is only called when a new sub-stream shows up (which happens several times every few hours, so it's not a big deal in my case). My real code looks a bit more complicated because I have a way to determine whether two sub-streams are likely to finish simultaneously, and if they are, I try to assign them to different threads in order to maintain even load after they do finish. But since I never mentioned this detail in the question, I guess it doesn't belong to the answer either.

Dynamic Scheduled Concurrent Task Execution in Java

I'm trying to implement an application that programs tasks based on some user input. The users can put a number of IPs with telnet commands associated with them (one to one relationship), a frequency of execution, and 2 groups (cluster, objectClass).
The user should be able to add/remove IPs, Clusters, commands, etc, at runtime. They should also be able to interrupt the executions.
This application should be able to send the telnet commands to the IPs, wait for a response and save the response in a database based on the frequency. The problem I'm having is trying to make all of this multithreaded, because there are at least 60,000 IPs to telnet, and doing it in a single thread would take too much time. One thread should process a group of IPs in the same cluster with the same objectClass.
I've looked at Quartz to schedule the jobs. With Quartz I tried to make a dynamic job that took a list of IPs (with commands), processed them and saved the result to database. But then I ran into the problem of the different timers that users gave. The examples on the Quartz webpage are incomplete and don't go too much into detail.
Then I tried to do it the old fashioned way, using java Threads, but I need to have exception handling and parameter passing, Threads don't do that. Then I discovered the Callables and Executors but I can't schedule tasks with Callables.
So Now I'm stumped, what do I do?

OK, here are some ideas. Take with the requisite grain of salt.
First, create a list of all of the work that you need to do. I assume you have this in tables somewhere and you can make a join that looks like this:
cluster | objectClass | ip-address | command | frequency | last-run-time
this represents all of the work your system needs to do. For the sake of explanation, I'll say frequency can take the form of "1 per day", "1 per hour", "4 per hour", "every minute". This table has one row per (cluster,objectClass,ip-address,command). Assume a different table has a history of runs, with error messages and other things.
Now what you need to do is read that table, and schedule the work. For scheduling use one of these:
ScheduledExecutorService exec = Executors...
When you schedule something, you need to tell it how often to run (easy enough with the frequencies we've given), and a delay. If something is to run every minute and it last ran 4 min 30 seconds ago, the initial delay is zero. If something is to run each hour the the initial delay is (60 min - 4.5 min = 55.5 min).
ScheduledFuture<?> handle = exec.scheduleAtFixedRate(...);
More complex types of scheduling are why things like Quartz exist, but basically you just need a way to resolve, given(schedule, last-run) an elapsed time to the next execution. If you can do that, then instead of scheduleAtFixedRate(...) you can use schedule(...) and then schedule the next run of a task as that task completes.
Anyway, when you schedule something, you'll get a handle back to it
ScheduledFuture<?> handle = exec.scheduleAtFixedRate(...);
Hold this handle in something that's accessible. For the sake of argument let's say it's a map by TaskKey. TaskKey is (cluster | objectClass | ip-address | command) together as an object.
Map<TaskKey,ScheduledFuture<?>> tasks = ...;
You can use that handle to cancel and schedule new jobs.
cancelForCustomer(CustomerId id) {
List<TaskKey> keys = db.findAllTasksOwnedByCustomer(id);
for(TaskKey key : keys) {
ScheduledFuture<?> f = tasks.get(key);
if(f!=null) f.cancel();
}
}
For parameter passing, create an object to represent your work. Create one of these with all the parameters you need.
class HostCheck implements Runnable {
private final Address host;
private final String command;
private final int something;
public HostCheck(Address host, String command; int something) {
this.host = host; this.command = command; this.something = something;
}
....
}
For exception handling, localize that all into your object
class HostCheck implements Runnable {
...
public void run() {
try {
check();
scheduleNextRun(); // optionally, if fixed-rate doesn't work
} catch( Exception e ) {
db.markFailure(task); // or however.
// Point is tell somebody about the failure.
// You can use this to decide to stop scheduling checks for the host
// or whatever, but just record the info now and us it to influence
// future behavior in, er, the future.
}
}
}
OK, so up to this point I think we're in pretty good shape. Lots of detail to fill in but it feels manageable. Now we get to some complexity, and that's the requirement that execution of "cluster/objectClass" pairs are serial.
There are a couple of ways to handle this.
If the number of unique pairs are low, you can just make Map<ClusterObjectClassPair,ScheduledExecutorService>, making sure to create single-threaded executor services (e.g., Executors.newSingleThreadScheduledExecutor()). So instead of a single scheduling service (exec, above), you have a bunch. Simple enough.
If you need to control the amount of work you attempt concurrently, then you can have each HealthCheck acquire a permit before execution. Have some global permit object
public static final Semaphore permits = java.util.concurrent.Semaphore(30);
And then
class HostCheck implements Runnable {
...
public void run() {
permits.acquire()
try {
check();
scheduleNextRun();
} catch( Exception e ) {
// regular handling
} finally {
permits.release();
}
}
}
You only have one thread per ClusterObjectClassPair, which serializes that work, and then permits just limit how many ClusterObjectClassPair you can talk to at a time.
I guess this turned it a quite a long answer. Good luck.

Two threads transferring data in both ways between two LinkedConcurrentQueue results in one empty queue while another "steals" everything

Everyone!
I've wrote a class (InAndOut) that extends Thread. This class receives in the constructor two LinkedConcurrentQueue, entrance and exit, and my run method transfers the objets from entrance to exit.
In my main method, I've instanciate two LinkedConcurrentQueue, myQueue1 and myQueue2, with some values in each. Then, I've instanciate two InAndOut, one receiving myQueue1 (entrance) and myQueue2 (exit) and another receiving myQueue2 (entrance) and myQueue1 (exit). Then, I call the start method of both instances.
The result, after some iterations, is the transference of all objects from a queue to another, in other words, myQueue1 becomes empty and myQueue2 "steals" all the objects. But, if I add a sleep call in each iteration (something like 100 ms), then the behavior is like I've expected (equilibrium between element number in both queues).
Why it's happening and how to fix it? There are some way to do not use this sleep call in my run method? Am I doing something wrong?
Here is my source code:
import java.util.concurrent.ConcurrentLinkedQueue;
class InAndOut extends Thread {
ConcurrentLinkedQueue<String> entrance;
ConcurrentLinkedQueue<String> exit;
String name;
public InAndOut(String name, ConcurrentLinkedQueue<String> entrance, ConcurrentLinkedQueue<String> exit){
this.entrance = entrance;
this.exit = exit;
this.name = name;
}
public void run() {
int it = 0;
while(it < 3000){
String value = entrance.poll();
if(value != null){
exit.offer(value);
System.err.println(this.name + " / entrance: " + entrance.size() + " / exit: " + exit.size());
}
//THIS IS THE SLEEP CALL THAT MAKES THE CODE WORK AS EXPECTED
try{
this.sleep(100);
} catch (Exception ex){
}
it++;
}
}
}
public class Main {
public static void main(String[] args) {
ConcurrentLinkedQueue<String> myQueue1 = new ConcurrentLinkedQueue<String>();
ConcurrentLinkedQueue<String> myQueue2 = new ConcurrentLinkedQueue<String>();
myQueue1.offer("a");
myQueue1.offer("b");
myQueue1.offer("c");
myQueue1.offer("d");
myQueue1.offer("e");
myQueue1.offer("f");
myQueue1.offer("g");
myQueue1.offer("h");
myQueue1.offer("i");
myQueue1.offer("j");
myQueue1.offer("k");
myQueue1.offer("l");
myQueue2.offer("m");
myQueue2.offer("n");
myQueue2.offer("o");
myQueue2.offer("p");
myQueue2.offer("q");
myQueue2.offer("r");
myQueue2.offer("s");
myQueue2.offer("t");
myQueue2.offer("u");
myQueue2.offer("v");
myQueue2.offer("w");
InAndOut es = new InAndOut("First", myQueue1, myQueue2);
InAndOut es2 = new InAndOut("Second", myQueue2, myQueue1);
es.start();
es2.start();
}
}
Thanks in advance!

Even if thread scheduling was deterministic the observed behavior remained plausible. As long as both threads perform the same task they might run balanced though you cannot rely on. But as soon as one queue runs empty the tasks are not balanced anymore. Compare:
Thread one polls from a queue which has items. The poll method will modify the source queue's state to reflect the removal, your code inserts the received item into the other queue, creating an internal list node object and modifying the target queue’s state to reflect the insertion. All modifications are performed in a way visible to other threads.
Thread two polls from an empty queue. The poll method checks a reference and finds null and that’s all. No other action is performed.
I think it should be obvious that one thread has far more to do than the other once one queue went empty. More precisely, one thread can finish its 3000 loop iterations (it could even do 300000) in a time that is not enough for the other to perform even a single iteration.
So once one queue is empty, one thread finishes its loop almost immediately and after that the other thread will transfer all items from one queue to the other and finish afterwards too.
So even with an almost deterministic scheduling behavior the balance would always bear the risk of tilting once one queue happens to get empty.
You can raise the chance for a balanced run by adding far more items to the queue to reduce the likelihood of one queue running empty. You can raise the number of iterations (to far bigger than a million) to avoid a thread exiting immediately when the queue runs empty or increment the counter only if a non-null item has been seen. You can use a CountDownLatch to let both threads wait before entering the loop compensating the thread startup overhead to have them run as synchronous as possible.
However, keep in mind that it still remains non-deterministic and polling loops waste CPU resources. Bot it’s ok to try and learn.

The order of execution with threads is undefined, so anything could happen. However since you do not start both threads simultaneously, you can make some assumptions on what might happen:
es is started first, so given a fast enough CPU, it has already pushed everything from queue1 into queue2 before the start of es2, then goes to sleep on take.
es2 starts and puts 1 element from queue2 back to queue1.
es wakes up at the same time and puts the element back.
Since both threads should "about" work at the same speed, one likely result is that there is only 1 or no element in es and all the remaining one in es2.

jtahlborn is exactly right when he says that multithreading is non-deterministic and as such I would suggest you read more into what your expectations are in this application because it isn't quite clear and it is functioning as I would expect it (based on how it's coded).
With that said, you may be looking for a BlockingQueue and not a ConcurrentLinkedQueue. A blocking queue will suspend the thread if empty and wait for it to have an items in it before continuing. Swap out ConcurrentLinkedQueue with LinkedBlockingQueue.
The difference between the two is that if ConcurrentLinkedQueue doesn't have an item it will return quickly with a null value so it can finish through the 3000 iterations very very quickly.

Weak performance of CyclicBarrier with many threads: Would a tree-like synchronization structure be an alternative?

Our application requires all worker threads to synchronize at a defined point. For this we use a CyclicBarrier, but it does not seem to scale well. With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
EDIT: Synchronization happens very frequently, in the order of 100k to 1M times.
If synchronization of many threads is "hard", would it help building a synchronization tree? Thread 1 waits for 2 and 3, which in turn wait for 4+5 and 6+7, respectively, etc.; after finishing, threads 2 and 3 wait for thread 1, thread 4 and 5 wait for thread 2, etc..
1
| \
2 3
|\ |\
4 5 6 7
Would such a setup reduce synchronization overhead? I'd appreciate any advice.
See also this featured question: What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?

With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
Honestly, there's your problem right there. Figure out a performance benchmark and prove that this is the problem, or risk spending hours / days solving the entirely wrong problem.

You are thinking about the problem in a subtly wrong way that tends to lead to very bad coding. You don't want to wait for threads, you want to wait for work to be completed.
Probably the most efficient way is a shared, waitable counter. When you make new work, increment the counter and signal the counter. When you complete work, decrement the counter. If there is no work to do, wait on the counter. If you drop the counter to zero, check if you can make new work.

If I understand correctly, you're trying to break your solution up into parts and solve them separately, but concurrently, right? Then have your current thread wait for those tasks? You want to use something like a fork/join pattern.
List<CustomThread> threads = new ArrayList<CustomThread>();
for (Something something : somethings) {
threads.add(new CustomThread(something));
}
for (CustomThread thread : threads) {
thread.start();
}
for (CustomThread thread : threads) {
thread.join(); // Blocks until thread is complete
}
List<Result> results = new ArrayList<Result>();
for (CustomThread thread : threads) {
results.add(thread.getResult());
}
// do something with results.
In Java 7, there's even further support via a fork/join pool. See ForkJoinPool and its trail, and use Google to find one of many other tutorials.
You can recurse on this concept to get the tree you want, just have the threads you create generate more threads in the exact same way.
Edit: I was under the impression that you wouldn't be creating that many threads, so this is better for your scenario. The example won't be horribly short, but it goes along the same vein as the discussion you're having in the other answer, that you can wait on jobs, not threads.
First, you need a Callable for your sub-jobs that takes an Input and returns a Result:
public class SubJob implements Callable<Result> {
private final Input input;
public MyCallable(Input input) {
this.input = input;
}
public Result call() {
// Actually process input here and return a result
return JobWorker.processInput(input);
}
}
Then to use it, create an ExecutorService with a fix-sized thread pool. This will limit the number of jobs you're running concurrently so you don't accidentally thread-bomb your system. Here's your main job:
public class MainJob extends Thread {
// Adjust the pool to the appropriate number of concurrent
// threads you want running at the same time
private static final ExecutorService pool = Executors.newFixedThreadPool(30);
private final List<Input> inputs;
public MainJob(List<Input> inputs) {
super("MainJob")
this.inputs = new ArrayList<Input>(inputs);
}
public void run() {
CompletionService<Result> compService = new ExecutorCompletionService(pool);
List<Result> results = new ArrayList<Result>();
int submittedJobs = inputs.size();
for (Input input : inputs) {
// Starts the job when a thread is available
compService.submit(new SubJob(input));
}
for (int i = 0; i < submittedJobs; i++) {
// Blocks until a job is completed
results.add(compService.take())
}
// Do something with results
}
}
This will allow you to reuse threads instead of generating a bunch of new ones every time you want to run a job. The completion service will do the blocking while it waits for jobs to complete. Also note that the results list will be in order of completion.
You can also use Executors.newCachedThreadPool, which creates a pool with no upper limit (like using Integer.MAX_VALUE). It will reuse threads if one is available and create a new one if all the threads in the pool are running a job. This may be desirable later if you start encountering deadlocks (because there's so many jobs in the fixed thread pool waiting that sub jobs can't run and complete). This will at least limit the number of threads you're creating/destroying.
Lastly, you'll need to shutdown the ExecutorService manually, perhaps via a shutdown hook, or the threads that it contains will not allow the JVM to terminate.
Hope that helps/makes sense.

If you have a generation task (like the example of processing columns of a matrix) then you may be stuck with a CyclicBarrier. That is to say, if every single piece of work for generation 1 must be done in order to process any work for generation 2, then the best you can do is to wait for that condition to be met.
If there are thousands of tasks in each generation, then it may be better to submit all of those tasks to an ExecutorService (ExecutorService.invokeAll) and simply wait for the results to return before proceeding to the next step. The advantage of doing this is eliminating context switching and wasted time/memory from allocating hundreds of threads when the physical CPU is bounded.
If your tasks are not generational but instead more of a tree-like structure in which only a subset need to be complete before the next step can occur on that subset, then you might want to consider a ForkJoinPool and you don't need Java 7 to do that. You can get a reference implementation for Java 6. This would be found under whatever JSR introduced the ForkJoinPool library code.
I also have another answer which provides a rough implementation in Java 6:
public class Fib implements Callable<Integer> {
int n;
Executor exec;
Fib(final int n, final Executor exec) {
this.n = n;
this.exec = exec;
}
/**
* {#inheritDoc}
*/
#Override
public Integer call() throws Exception {
if (n == 0 || n == 1) {
return n;
}
//Divide the problem
final Fib n1 = new Fib(n - 1, exec);
final Fib n2 = new Fib(n - 2, exec);
//FutureTask only allows run to complete once
final FutureTask<Integer> n2Task = new FutureTask<Integer>(n2);
//Ask the Executor for help
exec.execute(n2Task);
//Do half the work ourselves
final int partialResult = n1.call();
//Do the other half of the work if the Executor hasn't
n2Task.run();
//Return the combined result
return partialResult + n2Task.get();
}
}
Keep in mind that if you have divided the tasks up too much and the unit of work being done by each thread is too small, there will negative performance impacts. For example, the above code is a terribly slow way to solve Fibonacci.

Java Multi threading - Avoid duplicate request processing

I have following multi threaded environment scenario - Requests are coming to a method and I want to avoid the duplicate processing of concurrent requests coming. As multiple similar requests might be waiting for being processed in blocked state. I used hashtable to keep track of processed request, but it will create memory leaks, so how should keep track of processed request and avoid the same requests to be processed which may be in blocking state.
How to check, that any waiting/blocked incoming request is not the one which are processed in current threads.

Okay, I think I kinda understand what you want.
You can use a ConcurrentSkipListSet as a queue. Implement your queued elements like this:
class Element implements Comparable<Element> {
//To FIFOnize
private static final AtomicLong SEQ = new AtomicLong();
private final long id = SEQ.incrementAndGet();
//Can only be executed once.
private final Semaphore execPermission = new Semaphore(1);
public int compareTo(Element e){
// If element e1 exists on the queue such that
// e.compareTo(e1) == 0, that element will not
// be placed on the queue.
if(this.equals(e)){
return 0;
}else{
//This will enforce FIFO.
this.id > e.id ? 1 : ( this.id < e.id ? -1 : 0);
}
}
//implement both equals and hashCode
public boolean tryAcquire(){
return execPermission.tryAcquire();
}
}
Now your threads should,
while(!Thread.currentThread().isInterrupted()){
//Iterates from head, therefore simulates FIFO
for(Element e : queue){
if(e.tryAcquire()){
execute(e); //synchronous
queue.remove(e);
}
}
}
You can also use a blocking variant of this solution (have a bounded SortedSet and let worker threads block if there are no elements etc).

If the memory leak is the problem have a look at WeakHashMap to keep your request during processing.
Another solution would be to use a memory bound cache...

There is no inherent reason why keeping track of requests in a HashMap (or any other way you might choose) would lead to memory leaks. All that's needed is a way for entries to be removed once they have been processed.
This could mean having your request processing threads:
directly remove the entry;
communicate back to the dispatcher; or
mark the request as processed, so
that the dispatcher can remove the entries.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.