What does "less predictable performance of LinkedBlockingQueue in concurrent applications" mean?

What does "less predictable performance of LinkedBlockingQueue in concurrent applications" mean? - java

For the logging feature I am working on, I need to have a processing thread which will sit waiting for jobs and execute them in batches when the count reaches or exceeds certain number. Since it is a standard case of producer consumer problem, I intend to use BlockingQueues. I have a number of producers adding entries to the queue using add() method, whereas there is only one consumer thread that uses take() to wait on the queue.
LinkedBlockingQueue seems to be a good option since it does not have any size restriction on it, however I am confused reading this from the documentation.
Linked queues typically have higher throughput than array-based queues but less predictable performance in most concurrent applications.
It was not clearly explained what they mean by this statement. Can some one please throw light on it? Does it mean LinkedBlockingQueue is not thread safe? Did any of you encounter any issues using LinkedBlockingQueue.
Since the number of producers are lot more, there is always a scenario I can run into where the queue is overwhelmed with large number of entries to be added. If I were to use ArrayBlockingQueue instead, which takes size of the queue as parameter in the constructor, I could always run into capacity full related exceptions. In order to avoid this, I am not sure how to determine what size I should instantiate my ArrayBlockingQueue with. Did you have to solve a similar problem using ArrayBlockingQueue?

Does it mean LinkedBlockingQueue is not thread safe?
It certainly does not mean that. The phrase "less predictable performance" is talking about just that -- performance -- and not some violation of the thread-safety or Java collections contract.
I suspect this is more around the fact that it is a linked-list so iterating and other operations on the collection will be slower so the class will hold locks longer. It also has to deal with more memory structures since each element has it's one linked-list node as opposed to just an entry in an array. This means that it has to flush more dirty memory pages between processors when synchronizing. Again, this impacts performance.
What they are trying to say is that if you can, you should use the ArrayBlockingQueue but otherwise I wouldn't worry about it.
Did any of you encounter any issues using LinkedBlockingQueue.
I've used it a lot and not seen any problems. It also is used a lot in the ExecutorService classes which are used everywhere.

Related

Increasing program speedup when using shared memory

I have a program that calculates Pi from the Chudnovsky formula. It's written in Java and it uses a shared Vector that is used to save intermediate calculations like factorials and powers that include the index of the element.
However, I believe that since it's a synchronized Vector (thread safe by default) only one thread can read or write to it. So when we have lots of threads, instead of having increasing speedup, we see the computation time becomes constant.
Is there anything that I can do to circumvent that? What to do when there are too many threads reading/writing to the same shared memory?

When the access pattern is lots of reads and occasional writes, you can protect an unsyncronized data structure with a ReentrantReadWriteLock. It allows multiple readers, but only a single writer.
Depending on your implementation, you might also benefit from using a ConcurrentHashMap.
You might be able to cheat a bit and use either an AtomicIntegerArray or an AtomicReferenceArray of Futures/CompletionStages.

Store the results of each thread in a stack. One thread collects results from every thread and adds them together. Of course the stack should not be empty.
If you want multiple threads to work on factorials why not create a thread or two that produce a list of factorial results. Other threads can just look up results if needed.

Instead of having the same shared memory, you can have multiple threads with individual memories in a stack. Eventually, add all these up together (or occasionally) with one thread!

If you need high throughput, you can consider using Disruptor and RingBuffer.
At a crude level you can think of a Disruptor as a multicast graph of queues where producers put objects on it that are sent to all the consumers for parallel consumption through separate downstream queues. When you look inside you see that this network of queues is really a single data structure - a ring buffer.
Each producer and consumer has a sequence counter to indicate which slot in the buffer it's currently working on. Each producer/consumer writes its own sequence counter but can read the others' sequence counters
Few useful links:
https://lmax-exchange.github.io/disruptor
http://martinfowler.com/articles/lmax.html
https://softwareengineering.stackexchange.com/questions/244826/can-someone-explain-in-simple-terms-what-is-the-disruptor-pattern

When to prefer LinkedBlockingQueue over ArrayBlockingQueue?

When to prefer LinkedBlockingQueue over ArrayBlockingQueue?
Which data structure to use among LinkedBlockingQueue and ArrayBlockingQueue when:
You want an efficient read and write
should have lesser memory footprints
Although there is a similar question but it does not highlight the fact that which should be preferred?
Links:
Java: ArrayBlockingQueue vs. LinkedBlockingQueue
What is the Difference between ArrayBlockingQueue and LinkedBlockingQueue

Boris the Spider has already outlined the most visible difference between ArrayBlockingQueue and LinkedBlockingQueue - the former is always bounded, while the latter can be unbounded.
So in case you need an unbounded blocking queue, LinkedBlockingQueue or a LinkedTransferQueue used as a BlockingQueue are your best bets from the java.util.concurrent toolbox.
But let's say you need a bounded blocking queue.
In the end, you should choose an implementation based on extensive experimenting with a simulation of your real-world workload.
Nevertheless, here are some notes that can help you with your choice or with interpreting the results from the experiment:
ArrayBlockingQueue can be created with a configurable (on/off) scheduling fairness policy. This is great if you need fairness or want to avoid producer/consumer starvation, but it will cost you in throughput.
ArrayBlockingQueue pre-allocates its backing array, so it doesn't allocate nodes during its usage, but it immediately takes what can be a considerable chunk of memory, which can be a problem if your memory is fragmented.
ArrayBlockingQueue should have less variability in performance, because it has less moving parts overall, it uses a simpler and less-sophisticated single-lock algorithm, it does not create nodes during usage, and its cache behavior should be fairly consistent.
LinkedBlockingQueue should have better throughput, because it uses separate locks for the head and the tail.
LinkedBlockingQueue does not pre-allocate nodes, which means that its memory footprint will roughly match its size, but it also means that it will incur some work for allocation and freeing of nodes.
LinkedBlockingQueue will probably have worse cache behavior, which may affect its own performance, but also the performance of other components due to false sharing.
Depending on your use-case and how much do you care about performance, you may also want to look outside of java.util.concurrent and consider Disruptor (an exceptionally fast, but somewhat specialized bounded non-blocking ring buffer) or JCTools (a variety of bounded or unbounded queues with different guarantees depending on the number of producers and consumers).

From the JavaDoc for ArrayBlockingQueue
A bounded blocking queue backed by an array.
Emphasis mine
From the JavaDoc for LinkedBlockingQueue:
An optionally-bounded blocking queue based on linked nodes.
Emphasis mine
So if you need a bounded queue you can use either, if you need an unbounded queue you must use LinkedBlockingQueue.
For a bounded queue, then you would need to benchmark to work out which is better.

Java: Using ConcurrentHashMap as a lock manager

I'm writing a highly concurrent application, needing access to a large fine-grained set of shared resources. I'm currently writing a global lock manager to organize this. I'm wondering if I can piggyback off the standard ConcurrentHashMap and use that to handle the locking? I'm thinking of a system like the following:
A single global ConcurrentHashMap object contains a mapping between the unique string id of the resource, and a lock protecting that resource unique id of the thread using the resource
Tune the concurrency factor to reflect the need for a high level of concurrency
Locks are acquired using the atomic conditional replace(K key, V oldValue, V newValue) method in the hashmap
To prevent lock contention when locking multiple resources, locks must be acquired in alphabetical order
Are there any major issues with the setup? How will the performance be?
I know this is probably going to be much slower and more memory-heavy than a properly written locking system, but I'd rather not spend days trying to write my own, especially given that I probably won't be able to match Java's professionally-written concurrency code implementing the map.
Also, I've never used ConcurrentHashMap in a high-load situation, so I'm interested in the following:
How well will this scale to large numbers of elements? (I'm looking at ~1,000,000 being a good cap. If I reach beyond that I'd be willing to rewrite this more efficiently)
The documentation states that re-sizing is "relatively" slow. Just how slow is it? I'll probably have to re-size the map once every minute or so. Is this going to be problematic with the size of map I'm looking at?
Edit: Thanks Holger for pointing out that HashMaps shouldn't have that big of an issue with scaling
Also, is there is a better/more standard method out there? I can't find any places where a system like this is used, so I'm guessing that either I'm not seeing a major flaw, or theres something else.
Edit:
The application I'm writing is a network service, handling a variable number of requests. I'm using the Grizzly project to balance the requests among multiple threads.
Each request uses a small number of the shared resources (~30), so in general, I do not expect a large great deal of contention. The requests usually finish working with the resources in under 500ms. Thus, I'd be fine with a bit of blocking/continuous polling, as the requests aren't extremely time-sensitive and contention should be minimal.
In general, seeing that a proper solution would be quite similar to how ConcurrentHashMap works behind the scenes, I'm wondering if I can safely use that as a shortcut instead of writing/debugging/testing my own version.

The re-sizing issue is not relevant as you already told an estimate of the number of elements in your question. So you can give a ConcurrentHashMap an initial capacity large enough to avoid any rehashing.
The performance will not depend on the number of elements, that’s the main goal of hashing, but the number of concurrent threads.
The main problem is that you don’t have a plan of how to handle failed locks. Unless you want to poll until locking succeeds (which is not recommended) you need a way of putting a thread to sleep which implies that the thread currently owning the lock has to wake up a sleeping thread on release if one exists. So you end up requiring conventional Lock features a ConcurrentHashMap does not offer.
Creating a Lock per element (as you said ~1,000,000) would not be a solution.
A solution would look a bit like the ConcurrentHashMap works internally. Given a certain concurrency level, i.e. the number of threads you might have (rounded up), you create that number of Locks (which would be a far smaller number than 1,000,000).
Now you assign each element one of the Locks. A simple assignment would be based on the element’s hashCode, assuming it is stable. Then locking an element means locking the assigned Lock which gives you up to the configured concurrency level if all currently locked elements are mapped to different Locks.
This might imply that threads locking different elements block each other if the elements are mapped to the same Lock, but with a predictable likelihood. You can try fine-tuning the concurrency level (as said, use a number higher than the number of threads) to find the best trade-off.
A big advantage of this approach is that you do not need to maintain a data structure that depends on the number of elements. Afaik, the new parallel ClassLoader uses a similar technique.

JAVA- PriorityQueue implementation

Java, in the implementation of the PriorityQueue object, uses Heap.
Does the Implementation (by Java) parallel the "heapify" operation after the poll() operation (by another thread, for example)?
Thanks in advance.

The heapify operation only considers one element at a time, sinking or sifting it up. I don't know of a way in which it can be parallelized.
Still if you want to make sure why don't you have a look at the code?
EDIT: I am now sure at least for openjdk's implementation

No, it doesn't paralelize it. The algorithm is just not designed that way.
Additionally, consider that, since you have to wait for the whole operation to finish, you'd only get an advantage out of multi-threading it if there were significant code blocks where the computer just has to wait (e.g. retrieving a web page). Since this is clearly not the case for a heap, there's no benefit from it.
One more thing: whenever multi-threading is included, there's also a price to pay: maintenance becomes more complicated, there's CPU time spent in thread instantiation and lock management, etc...
In this case, it wouldn't help. A different issue would be if you wanted to have a data structure that needs to work distributedly across several computers in which case, a distributed variant would have to be developed, but only if the paralelization benefits outweight the overhead involved in distributing the data.

Most efficient collection for this kind of LILO?

I am programming a list of recent network messages communicated to/from a client. Basically I just want a list that stores up to X number of my message objects. Once the list reaches the desired size, the oldest (first) item in the list should be removed. The collection needs to maintain its order, and all I will need to do is
iterate through it,
add an item to the end, and
remove an item from the beginning, if #2 makes it too long.
What is the most efficient structure/array/collection/method for doing this? Thanks!

You want to use a Queue.

I don't think LILO is the real term...but you're looking for a FIFO Queue

I second #rich-adams re: Queue. In particular, since you mentioned responding to network messages, I think you may want something that handles concurrency well. Check out ArrayBlockingQueue.

Based on your third requirement, I think you're going to have to extend or wrap an existing implementation, and I recommend you start with ConcurrentLinkedQueue.
Other recommendations of using any kind of blocking queue are leading you down the wrong path. A blocking queue will not allow you to add an element to a full queue until another element is removed. Furthermore, they block while waiting for that operation to happen. By your own requirements, this isn't the behavior you want. You want to automatically remove the first element when a new one is added to a full queue.
It should be fairly simple to create a wrapper around ConcurrentLinkedQueue, overriding the offer method to check the size and capacity (your wrapper class will maintain the capacity). If they're equal, your offer method will need to poll the queue to remove the first element before adding the new one.

You can use an ArrayList for this. Todays computers copy data at such speeds that it doesn't matter unless your list can contain billions of elements.
Performance information: Copying 10 millions elements takes 13ms (thirteen milliseconds) on my dual core. So thinking even a second about the optimal data structure is a waste unless your use case is vastly different. In this case: You have more than 10 million elements and your application is doing nothing else but inserting and removing elements. If you operate in any way on the elements inserted/removed, chances are that the time spent in this operation exceeds the cost of the insert/remove.
A linked list seems to better at first glance but it needs more time when allocating memory plus the code is more complex (with all the pointer updating). So the runtime is worse. The only advantage of using a LinkedList in Java is that the class already implements the Queue interface, so it is more natural to use in your code (using peek() and pop()).
[EDIT] So let's have a look at efficiency. What is efficiency? The fastest algorithm? The one which takes the least amount of lines (and therefore has the least amount of bugs)? The algorithm which is easiest to use (= least amount of code on the developer side + less bugs)? The algorithm which performs best (which is not always the fastest algorithm)?
Let's look at some details: LinkedList implements Queue, so the code which uses the list is a bit more simple (list.pop() instead of list.remove(0)). But LinkedList will allocate memory for each add() while ArrayList only allocates memory once per N elements. And to reduce this even further, ArrayList will allocate N*3/2 elements, so as your list grows, the number of allocations will shrink. If you know the size of your list in advance, ArrayList will only allocate memory once. This also means that the GC has less clutter to clean up. So from a performance point of view, ArrayList wins by an order of magnitude in the average case.
The synchronized versions are only necessary when several threads access the data structure. With Java 5, many of those have seen dramatic speed improvements. If you have several threads putting and popping, use ArrayBlockingQueue but in this case, LinkedBlockingQueue might be an option despite the bad allocation performance since the implementation might allow to push and pop from two different threads at the same time as long as the queue size >= 2 (in this special case, the to threads won't have to access the same pointers). To decide that, the only option is to run a profiler and measure which version is faster.
That said: Any advice on performance is wrong 90% of the time unless it is backed by a measurement. Todays systems have become so complex and there is so much going on in the background that it is impossible for a mere human to understand or even enumerate all the factors which play a role.

you can get by with a plain old ArrayList.
When adding, just do (suppose the ArrayList is called al)
if (al.size() >= YOUR_MAX_ARRAY_SIZE)
{
al.remove(0);
}

I think that you want to implement a Queue<E> where you have the peek, pull and remove methods act as if there is nothing on the head until the count exceeds the threshold that you want. You probably want to wrap one of the existing implementions.

LinkedList should be what you're looking for

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.