Java Multi threading - Avoid duplicate request processing

Java Multi threading - Avoid duplicate request processing - java

I have following multi threaded environment scenario - Requests are coming to a method and I want to avoid the duplicate processing of concurrent requests coming. As multiple similar requests might be waiting for being processed in blocked state. I used hashtable to keep track of processed request, but it will create memory leaks, so how should keep track of processed request and avoid the same requests to be processed which may be in blocking state.
How to check, that any waiting/blocked incoming request is not the one which are processed in current threads.

Okay, I think I kinda understand what you want.
You can use a ConcurrentSkipListSet as a queue. Implement your queued elements like this:
class Element implements Comparable<Element> {
//To FIFOnize
private static final AtomicLong SEQ = new AtomicLong();
private final long id = SEQ.incrementAndGet();
//Can only be executed once.
private final Semaphore execPermission = new Semaphore(1);
public int compareTo(Element e){
// If element e1 exists on the queue such that
// e.compareTo(e1) == 0, that element will not
// be placed on the queue.
if(this.equals(e)){
return 0;
}else{
//This will enforce FIFO.
this.id > e.id ? 1 : ( this.id < e.id ? -1 : 0);
}
}
//implement both equals and hashCode
public boolean tryAcquire(){
return execPermission.tryAcquire();
}
}
Now your threads should,
while(!Thread.currentThread().isInterrupted()){
//Iterates from head, therefore simulates FIFO
for(Element e : queue){
if(e.tryAcquire()){
execute(e); //synchronous
queue.remove(e);
}
}
}
You can also use a blocking variant of this solution (have a bounded SortedSet and let worker threads block if there are no elements etc).

If the memory leak is the problem have a look at WeakHashMap to keep your request during processing.
Another solution would be to use a memory bound cache...

There is no inherent reason why keeping track of requests in a HashMap (or any other way you might choose) would lead to memory leaks. All that's needed is a way for entries to be removed once they have been processed.
This could mean having your request processing threads:
directly remove the entry;
communicate back to the dispatcher; or
mark the request as processed, so
that the dispatcher can remove the entries.

Related

What is the proper way to wait (block) until a LinkedBlockingQueue is nonempty, without mutating it? [duplicate]

I have a blocking queue of objects.
I want to write a thread that blocks till there is a object on the queue. Similar to the functionality provided by BlockingQueue.take().
However, since I do not know if I will be able to process the object successfully, I want to just peek() and not remove the object. I want to remove the object only if I am able to process it successfully.
So, I would like a blocking peek() function. Currently, peek() just returns if the queue is empty as per the javadocs.
Am I missing something? Is there another way to achieve this functionality?
EDIT:
Any thoughts on if I just used a thread safe queue and peeked and slept instead?
public void run() {
while (!exit) {
while (queue.size() != 0) {
Object o = queue.peek();
if (o != null) {
if (consume(o) == true) {
queue.remove();
} else {
Thread.sleep(10000); //need to backoff (60s) and try again
}
}
}
Thread.sleep(1000); //wait 1s for object on queue
}
}
Note that I only have one consumer thread and one (separate) producer thread. I guess this isn't as efficient as using a BlockingQueue... Any comments appreciated.

You could use a LinkedBlockingDeque and physically remove the item from the queue (using takeLast()) but replace it again at the end of the queue if processing fails using putLast(E e). Meanwhile your "producers" would add elements to the front of the queue using putFirst(E e).
You could always encapsulate this behaviour within your own Queue implementation and provide a blockingPeek() method that performs takeLast() followed by putLast() behind the scenes on the underlying LinkedBlockingDeque. Hence from the calling client's perspective the element is never removed from your queue.

However, since I do not know if I will be able to process the object successfully, I want to just peek() and not remove the object. I want to remove the object only if I am able to process it successfully.
In general, it is not thread-safe. What if, after you peek() and determine that the object can be processed successfully, but before you take() it to remove and process, another thread takes that object?

Could you also just add an event listener queue to your blocking queue, then when something is added to the (blocking)queue, send an event off to your listeners? You could have your thread block until it's actionPerformed method was called.

The only thing I'm aware of that does this is BlockingBuffer in Apache Commons Collections:
If either get or remove is called on
an empty Buffer, the calling thread
waits for notification that an add or
addAll operation has completed.
get() is equivalent to peek(), and a Buffer can be made to act like BlockingQueue by decorating a UnboundedFifoBuffer with a BlockingBuffer

The quick answer is, not there's not really a way have a blocking peek, bar implementing a blocking queue with a blocking peek() yourself.
Am I missing something?
peek() can be troublesome with concurrency -
If you can't process your peek()'d message - it'll be left in the queue, unless you have multiple consumers.
Who is going to get that object out of the queue if you can't process it ?
If you have multiple consumers, you get a race condition between you peek()'ing and another thread also processing items, resulting in duplicate processing or worse.
Sounds like you might be better off actually removing the item and process it using a
Chain-of-responsibility pattern
Edit: re: your last example: If you have only 1 consumer, you will never get rid of the object on the queue - unless it's updated in the mean time - in which case you'd better be very very careful about thread safety and probably shouldn't have put the item in the queue anyway.

Not an answer per se, but: JDK-6653412 claims this is not a valid use case.

Looks like BlockingQueue itself doesn't have the functionality you're specifying.
I might try to re-frame the problem a little though: what would you do with objects you can't "process correctly"? If you're just leaving them in the queue, you'll have to pull them out at some point and deal with them. I'd reccommend either figuring out how to process them (commonly, if a queue.get() gives any sort of invalid or bad value, you're probably OK to just drop it on the floor) or choosing a different data structure than a FIFO.

The 'simplest' solution
Do not process the next element until the previous element is processed succesfully.
public void run() {
Object lastSuccessfullyProcessedElement = null;
while (!exit) {
Object obj = lastSuccessfullyProcessedElement == null ? queue.take() : lastSuccessfullyProcessedElement; // blocking
boolean successful = process(obj);
if(!successful) {
lastSuccessfullyProcessedElement = obj;
} else {
lastSuccessfullyProcessedElement = null;
}
}
}
Calling peek() and checking if the value is null is not CPU efficient.
I have seen CPU usage going to 10% on my system when the queue is empty for the following program.
while (true) {
Object o = queue.peek();
if(o == null) continue;
// omitted for the sake of brevity
}
Adding sleep() adds slowness.
Adding it back to the queue using putLast will disturb the order. Moreover, it is a blocking operation which requires locks.

Any alternatives for linkedtransfer queue with size restrictions in java 7/8?

To implement producer/consumer pattern, I have used
LinkedTransferQueue.
check the below code
while (true) {
String tmp = new randomString();
if (linkedTransferQueueString.size() < 10000) {
linkedTransferQueueString.add(tmp);
}
}
From the docs, it states the size is O(n) operation :(. So for adding an element, it should go through the whole collection.
Is there any other concurrent collection queue which has size restriction?
Could not find any in java standard collections, apache concurrent collections ?

#OP: You had already accepted the answer and it is correct as well, but you still raise the bounty, so I am assuming you are more looking for the concept so I will just throw light on that part.
Now, your issue is that you are not happy with O(n) for size operation so it means your solution is either:
the data structure should be able to tell you that queue is full.
size operation should return you result in constant time.
It is not common that size operation will give O(n) but since in case of LinkedTransferQueue there is async behavior, so complete queue is traversed to ensure the number of elements in the queue. Otherwise most of the queue implementations give you size result in constant time, but you really don't need to do this size check, please keep on reading.
If you have hard dependency on purpose of LinkedTransferQueue i.e. you want to dequeue based on how long an element has been on the queue for some producer, then I don't think there is any alternative except that you can do some dirty thing like extending LinkedTransferQueue and then tracking the number of elements yourself, but soon it can become a mess and cannot give you accurate result and may give approximate result.
If you do not have any hard dependency on LinkedTransferQueue then you can use some flavor of BlockingQueue and many of them enable you to have a "bounded" queue (bounded queue is what you need) in some way or other - for example, ArrayBlockingQueue is implicitly bounded and you can create a bounded LinkedBlockingQueue like this new LinkedBlockingQueue(100). You can check the documentation for other queues.
And then you can use offer method of the queue, which will return FALSE if the queue is full, so if you are getting FALSE then you can handle as you want, so like this you need not to do explicit size check, you can simply put the element in the queue using offer method and it will return you a boolean indicating whether element was successfully placed in the queue or not.

BlockingQueue is
BlockingQueue implementations are thread-safe
[...]
A BlockingQueue may be capacity bounded.
and an ArrayBlockingQueue is
A bounded blocking queue backed by an array
Here's how you would write your example with it:
BlockingQueue queue = new ArrayBlockingQueue<>(10000);
while (true) {
String tmp = new randomString();
if (!queue.offer(tmp)) {
// the limit was reached, item was not added
}
}
Or for a simple producer/consumer example
public static void main(String[] args) {
// using a low limit so it doesn't take too long for the queue to fill
BlockingQueue<String> queue = new ArrayBlockingQueue<>(10);
Runnable producer = () -> {
if (!queue.offer(randomString())) {
System.out.println("queue was full!");
}
};
Runnable consumer = () -> {
try {
queue.take();
} catch (InterruptedException e) {
e.printStackTrace();
}
};
ScheduledExecutorService executor = Executors.newScheduledThreadPool(4);
// produce faster than consume so the queue becomes full eventually
executor.scheduleAtFixedRate(producer, 0, 100, TimeUnit.MILLISECONDS);
executor.scheduleAtFixedRate(consumer, 0, 200, TimeUnit.MILLISECONDS);
}

Have you tried ArrayBlockingQueue?
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ArrayBlockingQueue.html
It has size restriction and concurrency.
Also, the size is O(1).
public int size() {
final ReentrantLock lock = this.lock;
lock.lock();
try {
return count;
} finally {
lock.unlock();
}
}

Could you please go through BlockingQueue.
Here is the best link I found on the internet - BlockingQueue. BlockingQueue is an interface, it is in the package - java.util.concurrent and it has multiple implementations:-
ArrayBlockingQueue
DelayQueue
LinkedBlockingQueue
PriorityBlockingQueue
SynchronousQueue

Concurrent and scalable data structure in Java to handle tasks?

for my current development I have many threads (Producers) that create Tasks and many threads that that consume these Tasks (consumers)
Each Producers is identified by a unique name; A Tasks is made of:
the name of its Producers
a name
data
My question concerns the data structure used by the (Producers) and the (consumers).
Concurrent Queue?
Naively, we could imagine that Producers populate a concurrent-queue with Tasks and (consumers) reads/consumes the Tasks stored in the concurrent-queue.
I think that this solution would rather well scale but one single case is problematic: If a Producers creates very quickly two Tasks having the same name but not the same data (Both tasks T1 and T2 have the same name but T1 has data D1 and T2 has data D2), it is theoretically possible that they are consumed in the order T2 then T1!
Task Map + Queue?
Now, I imagine creating my own data structure (let's say MyQueue) based on Map + Queue. Such as a queue, it would have a pop() and a push() method.
The pop() method would be quite simple
The push() method would:
Check if an existing Task is not yet inserted in MyQueue (doing find() in the Map)
if found: data stored in the Task to-be-inserted would be merged with data stored in the found Task
if not found: the Task would be inserted in the Map and an entry would be added in the Queue
Of course, I'll have to make it safe for concurrent access... and that will certainly be my problem; I am almost sure that this solution won't scale.
So What?
So my question is now what are the best data structure I have to use in order to fulfill my requirements

You could try Heinz Kabutz's Striped Executor Service a possible candidate.
This magical thread pool would ensure that all Runnables with the same stripeClass would be executed in the order they were submitted, but StripedRunners with different stripedClasses could still execute independently.

Instead of making a data structure safe for concurrent access, why not opting out concurrent and go for parallel?
Functional programming models such as MapReduce are a very scalable way to solve this kind of problems.
I understand that D1 and D2 can be either analyzed together or in isolation and the only constraint is that they shouldn't be analyzed in the wrong order. (Making some assumption here ) But in case the real problem is only the way the results are combined, there might be an easy solution.
You could remove the constraint all together allowing them to be analyzed separately and then having a reduce function that is able to re-combine them together in a sensible way.
In this case you'd have the first step as map and the second as reduce.
Even if the computation is more efficient if done in a single go, a big part of scaling, especially scaling out is accomplished by denormalization.

If consumers are running in parallel, I doubt there is a way to make them execute tasks with the same name sequentially.
In your example (from comments):
BlockingQueue can really be a problem (unfortunately) if a Producer
"P1" adds a first task "T" with data D1 and quickly a second task "T"
with data D2. In this case, the first task can be handled by a thread
and the second task by another thread; If the threads handling the
first task is interrupted, the thread handling the second one can
complete first
There is no difference if P1 submits D2 not so quickly. Consumer1 could still be too slow, so consumer 2 would be able to finish first. Here is an example for such scenario:
P1: submit D1
C1: read D1
P2: submit D2
C2: read D2
C2: process D2
C1: process D1
To solve it, you will have to introduce some kind of completion detection, which I believe will overcomplicate things.
If you have enough load and can process some tasks with different names not sequentially, then you can use a queue per consumer and put same named tasks to the same queue.
public class ParallelQueue {
private final BlockingQueue<Task>[] queues;
private final int consumersCount;
public ParallelQueue(int consumersCount) {
this.consumersCount = consumersCount;
queues = new BlockingQueue[consumersCount];
for (int i = 0; i < consumersCount; i++) {
queues[i] = new LinkedBlockingQueue<>();
}
}
public void push(Task<?> task) {
int index = task.name.hashCode() % consumersCount;
queues[index].add(task);
}
public Task<?> pop(int consumerId) throws InterruptedException {
int index = consumerId % consumersCount;
return queues[index].take();
}
private final static class Task<T> {
private final String name;
private final T data;
private Task(String name, T data) {
this.name = name;
this.data = data;
}
}
}

Thread safety issue

I have a LinkedList with Objects, that I want to process. Objects get added to it from another thread, but only one Thread removes/reads from it.
private LinkedList<MyObject> queue = new LinkedList<>();
new Thread()
{
#Override
public void run()
{
while (!Thread.interrupted())
{
if (!queue.isEmpty())
{
MyObject first = queue.removeFirst();
// do sth..
}
}
}
}.start();
In another Thread I add Objects to the queue
queue.add(new MyObject());
Sometimes this code leads to an Exception though, which I cant really explain to myself.
Exception in thread "" java.util.NoSuchElementException
at java.util.LinkedList.removeFirst(LinkedList.java:270)
I dont get, why I get this Exception, since it should only try to remove an object if one exists.

As Nicolas has already mentioned, you need a thread safe implementation. I would recommend using LinkedBlockingQueue.
You can add to it using offer method and remove using take which will also resolve your "busy waiting" problem.

A LinkedList is not thread-safe so you can't share it with several threads as you currently do otherwise you will face unpredictable bugs like this one due to concurrent modifications that lead to an inconsistent state, use instead a thread-safe deque such as ConcurrentLinkedDeque.

Although I think there have been a few good solutions offered as to how to resolve the problem, none of the answers explained why #BluE sees the NoSuchElementException. So here is what I think could be happening.
Since LinkedList access is not synchronized it is possible that:
The producer thread adds an element to the queue
Two consumer threads concurrently check for if (!queue.isEmpty()) and see that it is not.
Both consumer threads go ahead and try to take an element from the queue invoking MyObject first = queue.removeFirst();
One of the threads succeeds and the other one fails with a NoSuchElementException since there are no more elements in the queue.
UPDATE:
Provided you have only one producer and one consumer, I think the Java Memory Model specification could explain the behaviour you see.
Long story short, since the access to the LinkedList is not synchronized there are no data visibility guarantees offered by the JVM. Let's have a look at the implementations of isEmpty and removeFirst methods:
From LinkedList
transient int size = 0;
transient Node<E> first;
// ...
public int More ...size() {
return size;
}
// ...
public E removeFirst() {
final Node<E> f = first;
if (f == null)
throw new NoSuchElementException();
return unlinkFirst(f);
}
From AbstractCollection
public boolean isEmpty() {
return size() == 0;
}
As you can see, the size and the elements are stored in different variables. So it is possible that the consumer thread sees the updates on the "size" variable and does not see the updates on the "first".

What you could do is to use some kind of technique to coordinate the threads, like Mutex, Semaphore, Monitor, Mailbox etc.

Two threads transferring data in both ways between two LinkedConcurrentQueue results in one empty queue while another "steals" everything

Everyone!
I've wrote a class (InAndOut) that extends Thread. This class receives in the constructor two LinkedConcurrentQueue, entrance and exit, and my run method transfers the objets from entrance to exit.
In my main method, I've instanciate two LinkedConcurrentQueue, myQueue1 and myQueue2, with some values in each. Then, I've instanciate two InAndOut, one receiving myQueue1 (entrance) and myQueue2 (exit) and another receiving myQueue2 (entrance) and myQueue1 (exit). Then, I call the start method of both instances.
The result, after some iterations, is the transference of all objects from a queue to another, in other words, myQueue1 becomes empty and myQueue2 "steals" all the objects. But, if I add a sleep call in each iteration (something like 100 ms), then the behavior is like I've expected (equilibrium between element number in both queues).
Why it's happening and how to fix it? There are some way to do not use this sleep call in my run method? Am I doing something wrong?
Here is my source code:
import java.util.concurrent.ConcurrentLinkedQueue;
class InAndOut extends Thread {
ConcurrentLinkedQueue<String> entrance;
ConcurrentLinkedQueue<String> exit;
String name;
public InAndOut(String name, ConcurrentLinkedQueue<String> entrance, ConcurrentLinkedQueue<String> exit){
this.entrance = entrance;
this.exit = exit;
this.name = name;
}
public void run() {
int it = 0;
while(it < 3000){
String value = entrance.poll();
if(value != null){
exit.offer(value);
System.err.println(this.name + " / entrance: " + entrance.size() + " / exit: " + exit.size());
}
//THIS IS THE SLEEP CALL THAT MAKES THE CODE WORK AS EXPECTED
try{
this.sleep(100);
} catch (Exception ex){
}
it++;
}
}
}
public class Main {
public static void main(String[] args) {
ConcurrentLinkedQueue<String> myQueue1 = new ConcurrentLinkedQueue<String>();
ConcurrentLinkedQueue<String> myQueue2 = new ConcurrentLinkedQueue<String>();
myQueue1.offer("a");
myQueue1.offer("b");
myQueue1.offer("c");
myQueue1.offer("d");
myQueue1.offer("e");
myQueue1.offer("f");
myQueue1.offer("g");
myQueue1.offer("h");
myQueue1.offer("i");
myQueue1.offer("j");
myQueue1.offer("k");
myQueue1.offer("l");
myQueue2.offer("m");
myQueue2.offer("n");
myQueue2.offer("o");
myQueue2.offer("p");
myQueue2.offer("q");
myQueue2.offer("r");
myQueue2.offer("s");
myQueue2.offer("t");
myQueue2.offer("u");
myQueue2.offer("v");
myQueue2.offer("w");
InAndOut es = new InAndOut("First", myQueue1, myQueue2);
InAndOut es2 = new InAndOut("Second", myQueue2, myQueue1);
es.start();
es2.start();
}
}
Thanks in advance!

Even if thread scheduling was deterministic the observed behavior remained plausible. As long as both threads perform the same task they might run balanced though you cannot rely on. But as soon as one queue runs empty the tasks are not balanced anymore. Compare:
Thread one polls from a queue which has items. The poll method will modify the source queue's state to reflect the removal, your code inserts the received item into the other queue, creating an internal list node object and modifying the target queue’s state to reflect the insertion. All modifications are performed in a way visible to other threads.
Thread two polls from an empty queue. The poll method checks a reference and finds null and that’s all. No other action is performed.
I think it should be obvious that one thread has far more to do than the other once one queue went empty. More precisely, one thread can finish its 3000 loop iterations (it could even do 300000) in a time that is not enough for the other to perform even a single iteration.
So once one queue is empty, one thread finishes its loop almost immediately and after that the other thread will transfer all items from one queue to the other and finish afterwards too.
So even with an almost deterministic scheduling behavior the balance would always bear the risk of tilting once one queue happens to get empty.
You can raise the chance for a balanced run by adding far more items to the queue to reduce the likelihood of one queue running empty. You can raise the number of iterations (to far bigger than a million) to avoid a thread exiting immediately when the queue runs empty or increment the counter only if a non-null item has been seen. You can use a CountDownLatch to let both threads wait before entering the loop compensating the thread startup overhead to have them run as synchronous as possible.
However, keep in mind that it still remains non-deterministic and polling loops waste CPU resources. Bot it’s ok to try and learn.

The order of execution with threads is undefined, so anything could happen. However since you do not start both threads simultaneously, you can make some assumptions on what might happen:
es is started first, so given a fast enough CPU, it has already pushed everything from queue1 into queue2 before the start of es2, then goes to sleep on take.
es2 starts and puts 1 element from queue2 back to queue1.
es wakes up at the same time and puts the element back.
Since both threads should "about" work at the same speed, one likely result is that there is only 1 or no element in es and all the remaining one in es2.

jtahlborn is exactly right when he says that multithreading is non-deterministic and as such I would suggest you read more into what your expectations are in this application because it isn't quite clear and it is functioning as I would expect it (based on how it's coded).
With that said, you may be looking for a BlockingQueue and not a ConcurrentLinkedQueue. A blocking queue will suspend the thread if empty and wait for it to have an items in it before continuing. Swap out ConcurrentLinkedQueue with LinkedBlockingQueue.
The difference between the two is that if ConcurrentLinkedQueue doesn't have an item it will return quickly with a null value so it can finish through the 3000 iterations very very quickly.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.