Queue implementation with blocked 'take()' but with eviction policy

Queue implementation with blocked 'take()' but with eviction policy - java

Is there an implementation with a blocking queue for take but bounded by a maximum size. When the size of the queue reaches a given max-size, instead of blocking 'put', it will remove the head element and insert it. So put is not blocked() but take() is.
One usage is that if I have a very slow consumer, the system will not crash ( runs out of memory ) rather these message will be removed but I do not want to block the producer.
An example of this would stock trading system. When you get a spike in stock trade/quote data, if you haven't consumed data, you want to automatically throw away old stock trade/quote.

There currently isnt in Java a thread-safe queue that will do what you are looking for. However, there is a BlockingDequeue (Double Ended Queue) that you can write a wrapper in which you can take from the head and and tail as you see freely.
This class, similar to a BlockingQueue, is thread safe.

Several strategies are provided in ThreadPoolExecutor. Search for "AbortPolicy" in this javadoc . You can also implement your own policy if you want. Perhaps Discard is similar to what you want. Personally I think CallerRuns is what you want in most cases.
I think using these is a better solution, but if you absolutely want to implement it at the queue, I'd probably do it by composition. Perhaps use a LinkedList or something and wrap it with synchronize keyword.
EDIT:(some clarifications..)
"Executor" is basically a thread pool combined with a blocking queue. It is the recommended way to implement a producer/consumer pattern in java. The authors of these libraries provides several strategies to cope with issues like you mentioned. If you are interested, here is another approach to specifically address the OOME issue (the source is framework specific and can't be used as is).

Related

Java concurrency - Should block or yield?

I have multiple threads each one with its own private concurrent queue and all they do is run an infinite loop retrieving messages from it. It could happen that one of the queues doesn't receive messages for a period of time (maybe a couple seconds), and also they could come in big bursts and fast processing is necessary.
I would like to know what would be the most appropriate to do in the first case: use a blocking queue and block the thread until I have more input or do a Thread.yield()?
I want to have as much CPU resources available as possible at a given time, as the number of concurrent threads may increase with time, but also I don't want the message processing to fall behind, as there is no guarantee of when the thread will be reescheduled for execution when doing a yield(). I know that hardware, operating system and other factors play an important role here, but setting that aside and looking at it from a Java (JVM?) point of view, what would be the most optimal?

Always just block on the queues. Java yields in the queues internally.
In other words: You cannot get any performance benefit in the other threads if you yield in one of them rather than just block.

You certainly want to use a blocking queue - they are designed for exactly this purpose (you want your threads to not use CPU time when there is no work to do).
Thread.yield() is an extremely temperamental beast - the scheduler plays a large role in exactly what it does; and one simple but valid implementation is to simply do nothing.

Alternatively, consider converting your implementation to use one of the managed ExecutorService implementations - probably ThreadPoolExecutor.
This may not be appropriate for your use case, but if it is, it removes the whole burden of worrying about thread management from your own code - and these questions about yielding or not simply vanish.
In addition, if better thread management algorithms emerge in future - for example, something akin to Apple's Grand Central Dispatch - you may be able to convert your application to use it with almost no effort.

Another thing that you could do is use the concurrent hash map for your queue. When you do a read it gives you a reference of the object you were looking for, so it is possible you my miss a message that was just put into the queue. But if all this is doing is listening for a message you will catch it the next iteration. It would be different if the messages could be updated by other threads. But there doesn't really seem to be a reason to block that I can see.

Multiple SingleThreadExecutors for a given application...a good idea?

This question is about the fallouts of using SingleThreadExecutor (JDK 1.6). Related questions have been asked and answered in this forum before, but I believe the situation I am facing, is a bit different.
Various components of the application (let's call the components C1, C2, C3 etc.) generate (outbound) messages, mostly in response to messages (inbound) that they receive from other components. These outbound messages are kept in queues which are usually ArrayBlockingQueue instances - fairly standard practice perhaps. However, the outbound messages must be processed in the order they are added. I guess use of a SingleThreadExector is the obvious answer here. We end up having a 1:1 situation - one SingleThreadExecutor for one queue (which is dedicated to messages emanating from one component).
Now, the number of components (C1,C2,C3...) is unknown at a given moment. They will come into existence depending on the need of the users (and will be eventually disposed of too). We are talking about 200-300 such components at the peak load. Following the 1:1 design principle stated above, we are going to arrange for 200 SingleThreadExecutors. This is the source of my query here.
I am uncomfortable with the thought of having to create so many SingleThreadExecutors. I would rather try and use a pool of SingleThreadExecutors, if that makes sense and is plausible (any ready-made, seen-before classes/patterns?). I have read many posts on recommended use of SingleThreadExecutor here, but what about a pool of the same?
What do learned women and men here think? I would like to be directed, corrected or simply, admonished :-).

If your requirement is that the messages be processed in the order that they're posted, then you want one and only one SingleThreadExecutor. If you have multiple executors, then messages will be processed out-of-order across the set of executors.
If messages need only be processed in the order that they're received for a single producer, then it makes sense to have one executor per producer. If you try pooling executors, then you're going to have to put a lot of work into ensuring affinity between producer and executor.
Since you indicate that your producers will have defined lifetimes, one thing that you have to ensure is that you properly shut down your executors when they're done.

Messaging and batch jobs is something that has been solved time and time again. I suggest not attempting to solve it again. Instead, look into Quartz, which maintains thread pools, persisting tasks in a database etc. Or, maybe even better look into JMS/ActiveMQ. But, at the very least look into Quartz, if you have not already. Oh, and Spring makes working with Quartz so much easier...

I don't see any problem there. Essentially you have independent queues and each has to be drained sequentially, one thread for each is a natural design. Anything else you can come up with are essentially the same. As an example, when Java NIO first came out, frameworks were written trying to take advantage of it and get away from the thread-per-request model. In the end some authors admitted that to provide a good programming model they are just reimplementing threading all over again.

It's impossible to say whether 300 or even 3000 threads will cause any issues without knowing more about your application. I strongly recommend that you should profile your application before adding more complexity
The first thing that you should check is that number of concurrently running threads should not be much higher than number of cores available to run those threads. The more active threads you have, the more time is wasted managing those threads (context switch is expensive) and the less work gets done.
The easiest way to limit number of running threads is to use semaphore. Acquire semaphore before starting work and release it after the work is done.
Unfortunately limiting number of running threads may not be enough. While it may help, overhead may still be to great, if time spent per context switch is major part of total cost of one unit of work. In this scenario, often the most efficient way is to have fixed number of queues. You get queue from global pool of queues when component initializes using algorithm such as round-robin for queue selection.
If you are in one of those unfortunate cases where most obvious solutions do not work, I would start with something relatively simple: one thread pool, one concurrent queue, lock, list of queues and temporary queue for each thread in pool.
Posting work to queue is simple: add payload and identity of producer.
Processing is relatively straightforward as well. First you get get next item from queue. Then you acquire the lock. While you have lock in place, you check if any of other threads is running task for same producer. If not, you register thread by adding a temporary queue to list of queues. Otherwise you add task to existing temporary queue. Finally you release the lock. Now you either run the task or poll for next and start over depending on whether current thread was registered to run tasks. After running the task, you get lock again and see, if there is more work to be done in temporary queue. If not, remove queue from list. Otherwise get next task. Finally you release the lock. Again, you choose whether to run the task or to start over.

Java: Large collection and concurrent threads

I am facing this issue:
I have lots of threads (1024) who access one large collection - Vector.
Question:
is it possible to do something about it which would allow me to do concurrent actions on it without having to synchronize everything (since that takes time)? What I mean, is something like Mysql database works, you don't have to worry about synchronizing and thread-safe issues. Is there some collection alike that in Java? Thanks

Vector is a very old Java class - predates the Collections API. It synchronizes on every operation, so you're not going to have any luck trying to speed it up.
You should consider reworking your code to use something like ConcurrentHashMap or a LinkedBlockingQueue, which are highly optimized for concurrent access.
Failing that, you mention that you'd like performance and access semantics similar to a database - why not use a dedicated database or a message queue? They are likely to implement it better than you ever will, and it's less code for you to write!
[edit] Given your comment:
all what thread does is adding elements to vector
(only if num of elements in vector = 0) &
removing elements from vector. (if vector size > 0)
it sounds very much like you should be using something much more like a queue than a list! A bounded queue with size 1 will give you these semantics - although I'd question why you can't add elements if there is already something there. When you've got thousands of threads this seems like a very inefficient design.

Well first off, this design doesn't sound right. It sounds like you need to think about using a proper database rather than an simple data structure, even if this means just using something like an in-memory instance of HypersonicDB.
However, if you insist on doing things this way, then the java.util.concurrent package has a number of highly concurrent, non-locking data structures. One of them might suit your purpose (e.g. ConcurrentHashMap, if you can use a Map rather than a List)

Looks like you are implementing the producer consumer pattern, you should google "producer consumer java" or have a look at the BlockingQueue interface

I agree with skaffman about looking at java.util.concurrent.
ConcurrentHashMap is very scalable. However, the size() call on it returns only an approximation. So e.g. your app will occasionally be adding elements to it even if !(num of elements in vector = 0).
If you want to strictly enforce the condition you gave, there is no other way than to synchronize.
Instead of having tons of context switches, I guess you could let your users thread post a callable on a queue and have only one thread dealing with the mutation. This will eliminate the need for synchronization on the collection. The user threads can wait on Future.get().
Just an idea.

If you do not want to change your data structure and have only infrequent writes, you might also use one or many ReentrantReadWriteLock to synchronize access. Then many threads can read at the same time, but when a thread wants to write all reads are blocked until the write is done.
But you should check whether the used data structure is appropriate for the task, or whether another of the many java.util or java.util.concurrent classes is more appropriate. java.util.Vector is synchronized, by the way.

What are the advantages of Blocking Queue in Java?

I am working on a project that uses a queue that keeps information about the messages that need to be sent to remote hosts. In that case one thread is responsible for putting information into the queue and another thread is responsible for getting information from the queue and sending it. The 2nd thread needs to check the queue for the information periodically.
But later I found this is reinvention of the wheel :) I could use a blocking queue for this purpose.
What are the other advantages of using a blocking queue for the above application? (Ex : Performance, Modifiable of the code, Any special tricks etc )

The main advantage is that a BlockingQueue provides a correct, thread-safe implementation. Developers have implemented this feature themselves for years, but it is tricky to get right. Now the runtime has an implementation developed, reviewed, and maintained by concurrency experts.
The "blocking" nature of the queue has a couple of advantages. First, on adding elements, if the queue capacity is limited, memory consumption is limited as well. Also, if the queue consumers get too far behind producers, the producers are naturally throttled since they have to wait to add elements. When taking elements from the queue, the main advantage is simplicity; waiting forever is trivial, and correctly waiting for a specified time-out is only a little more complicated.

They key thing you eliminate with the blocking queue is 'polling'. This is where you say
In that case the 2nd thread needs to check the queue for the information periodically.
This can be very inefficient - using much unnecessary CPU time. It can also introduce unneeded latencies.

Where should you use BlockingQueue Implementations instead of Simple Queue Implementations?

I think I shall reframe my question from
Where should you use BlockingQueue Implementations instead of Simple Queue Implementations ?
to
What are the advantages/disadvantages of BlockingQueue over Queue implementations taking into consideration aspects like speed,concurrency or other properties which vary e.g. time to access last element.
I have used both kind of Queues. I know that Blocking Queue is normally used in concurrent application. I was writing simple ByteBuffer pool where I needed some placeholder for ByteBuffer objects. I needed fastest , thread safe queue implementation. Even there are List implementations like ArrayList which has constant access time for elements.
Can anyone discuss about pros and cons of BlockingQueue vs Queue vs List implementations?
Currently I have used ArrayList to hold these ByteBuffer objects.
Which data structure shall I use to hold these objects?

A limited capacity BlockingQueue is also helpful if you want to throttle some sort of request. With an unbounded queue, a producers can get far ahead of the consumers. The tasks will eventually be performed (unless there are so many that they cause an OutOfMemoryError), but the producer may long since have given up, so the effort is wasted.
In situations like these, it may be better to signal a would-be producer that the queue is full, and to give up quickly with a failure. For example, the producer might be a web request, with a user that doesn't want to wait too long, and even though it won't consume many CPU cycles while waiting, it is using up limited resources like a socket and some memory. Giving up will give the tasks that have been queued already a better chance to finish in a timely manner.
Regarding the amended question, which I'm interpreting as, "What is a good collection for holding objects in a pool?"
An unbounded LinkedBlockingQueue is a good choice for many pools. However, depending on your pool management strategy, a ConcurrentLinkedQueue may work too.
In a pooling application, a blocking "put" is not appropriate. Controlling the maximum size of the queue is the job of the pool manager—it decides when to create or destroy resources for the pool. Clients of the pool borrow and return resources from the pool. Adding a new object, or returning a previously borrowed object to the pool should be fast, non-blocking operations. So, a bounded capacity queue is not a good choice for pools.
On the other hand, when retrieving an object from the pool, most applications want to wait until a resource is available. A "take" operation that blocks, at least temporarily, is much more efficient than a "busy wait"—repeatedly polling until a resource is available. The LinkedBlockingQueue is a good choice in this case. A borrower can block indefinitely with take, or limit the time it is willing to block with poll.
A less common case in when a client is not willing to block at all, but has the ability to create a resource for itself if the pool is empty. In that case, a ConcurrentLinkedQueue is a good choice. This is sort of a gray area where it would be nice to share a resource (e.g., memory) as much as possible, but speed is even more important. In the worse case, this degenerates to every thread having its own instance of the resource; then it would have been more efficient not to bother trying to share among threads.
Both of these collections give good performance and ease of use in a concurrent application. For non-concurrent applications, an ArrayList is hard to beat. Even for collections that grow dynamically, the per-element overhead of a LinkedList allows an ArrayList with some empty slots to stay competitive memory-wise.

You would see BlockingQueue in multi-threaded situations. For example you need pass in a BlockingQueue as a parameter to create ThreadPoolExecutor if you want to create one using constructor. Depending on the type of queue you pass in the executor could act differently.

It is a Queue implementation that additionally supports operations that
wait for the queue to become non-empty when retrieving an element,
and
wait for space to become available in the queue when storing an
element.
If you required above functionality will be followed by your Queue implementation then use Blocking Queue

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.