Java bounded non-blocking buffer for high concurrent situation - java

Basically I need a data structure to store the temporary chatting messages on the server side. It should be:
bounded: because I don't need store too many messages, the client will send request to get the new messages every second. I think the bound size should be the max. mount of concurrent requests in one second. When the buffer is full, the old messages will be removed.
suitable for high concurrent access: I don't want to use the data structure like Collections.synchronizedXXXX, because during the iteration, if other thread changes the data structure, e.g. adds a message, it will throw an Exception, so I have to lock the whole data structure, actually I don't really care if the client request can get the most last inserted message, because they will send a new request after one second, on the other side the write operation should be never delayed. The classes under the package java.util.concurrency seems the solution, but...
non-blocking: LinkedBlockingQueue, ArrayBlockingQueue they could be bounded and won't throw exception during iteration, but they are all blocking queue. When the queue is full, I want to add the new element to the tails and remove the old element from head instead of blocking there and wait for someone to remove the header.
So my question is there any good implementation from 3rd library? For example Google Guava?
Or maybe you have better idea about storing the temporary chatting messages on server?
thank you very much!

You can use LinkedBlockingQueue with the non-blocking methods offer (or add) and poll to access it.
You can create it with a fixed capacity to make it bounded.
LinkedBlockingQueue<String> myStrings = new LinkedBlockingQueue<String>(100);
myStrings.offer("Hi!"); // returns false if limit is reached
myStrings.add("Hi, again!"); // throws exception if limit is reached
String s = myStrings.poll(); // returns null if queue is empty

You could utilize the Apache Commons CircularFifoBuffer. It meets your first and last criteria. To support concurrency, you can wrap the base buffer in it's synchronized version like so:
Buffer fifo = BufferUtils.synchronizedBuffer(new CircularFifoBuffer());
Good luck on the project.

Did you take a look at ConcurrentLinkedQueue?
The page says
This implementation employs an efficient "wait-free" algorithm...
Wait-freedom is one of the strongest guarantee you can obtain....

You can add non-blocking behaviour to an ArrayBlockingQueue by surrounding it with a conditional offer() statement, where failure of the queue to accept the offer results in the head being dropped and the offer being re-made:
public class LearnToQueue {
public static void main(String[] args){
Queue<Integer> FIFO = new ArrayBlockingQueue<Integer>(4);
int i = 0;
while ( i < 10 ){
if (!FIFO.offer(i)){
// You can pipe the head of the queue anywhere you want to
FIFO.remove();
FIFO.offer(i);
}
System.out.println(FIFO.toString());
i++;
}
}
}

LinkedTransferQueue is a blocking, unbounded queue that doesn't enforce strict FIFO ordering. It will only block when taking from an empty queue, but never on adding to one. You could add a soft cap to evict elements by adding either a size or read & write counters.
Depending on your requirements, you may be able to write a custom lock-free ring buffer.

Related

Java persist a list with multi thread

I need to create a list to do the following operation:
I receive an object from an external queue/topic every microsecond.
After doing some operations on the object, I need to persist these objects into database.
I am doing the persist in batches of 100 or 1000. The only problem is, persist rate is lower than the incoming message rate. Now I don't want to keep this in a single thread since the persist will slow down the message consumption.
My idea is to keep accepting the message objects and adding them to a collection (like a linked list)
And keep removing from the other end of the collection in batches of 100 or 1000 and persist into database.
What is the right collection to use? How to synchronize this and avoid concurrent modification exceptions?
Below is the code I'm trying to implement with an ArrayList that clears out the list every few seconds while persisting.
class myclass{
List persistList;
ScheduledExecutorService persistExecutor;
ScheduledFuture scheduledFuture;
PersistOperation persistOperation;
//Initialize delay, interval
void init(){
scheduledFuture=persistExecutor.scheduleAtFixedRate(new persistOperation(persistList), delay, interval, TimeUnit.SECONDS);
}
void execute(msg){
//process the message and add to the persist list
}
class PersistOperation implements Runnable{
List persistList
PersistOperation(List persistList){
//Parameterized constructor
}
run(){
//Copy persistList to new ArrayList and clear persistList
//entity manager persist/update/merge
}
}
}
And keep removing from the other end of the collection in batches of 100 or 1000 and persist into database.
This is reasonable so long as multiple threads poll from the collection.
Below is the code I'm trying to implement with an ArrayList
An ArrayList is a bad choice here, as it is not thread-safe and, when removing an element at index 0, every element to the right of it must be shifted over (an O(n) operation).
The collection that you're looking for is called a Deque, otherwise known as a double-ended queue. However, because you need the collection to be thread-safe, I recommend using a ConcurrentLinkedDeque.
I think that you will want to use the LMAX Disruptor framework here. I envision two RingBuffers. You would use the first to accept incoming messages. Your worker(s) would read from the RingBuffer. You would set the size of the RingBuffer to equal your persistence chunk size (eg 100 or 1000). After a worker takes an event from the RingBuffer and processes it, it places a reference to the persisted object into a Queue Collection. Each time the first RingBuffer has been circled once, you allocate a new Queue and place the old Queue into the second RingBuffer. The worker(s) for the second RingBuffer take a Queue object from the RingBuffer, persist all the objects in the Queue, and then move to the next queue. You can tune the size of the second RingBuffer and the worker threads to accommodate the speed at which the database can persist your chunks.
You risk losing messages with that approach, if you have 100 messages receive but not saved, and your application dies, can you afford to lose those messages?
The kind of topic/queue is important here, topics have the advantage of managing this backpressure control, queues are usually there because ordered processing is required.
If you queue/topic is kafka, and you pull messages, kafka can pull batches, and you probably can save batches to the database as well, an only ack the messages to kafka once saved.
If your processing needs to be ordered, you can probably handle some king of reactive approach and tune the db. A queue system can control the flow, usually.

ConcurrencyHashMap and BashProcessing

First some abstraction. My problem can be modelized as the following :
I have a room with N doors, and anyone can drop package in the room. Once there is a given number of package in the room, I want them to be shipped away while keeping the doors open.
Using Java 1.8, I'm working on a multi-threaded application where any thread can add items to my ConcurrentHashMap object.
I want to regularly dump my ConcurrentHashMap when it reaches a certain size without blocking the threads adding items to the Map. Dumping including several operations which are costly.
I thought of the following solutions :
Check the size of the HashMap each time I add something, and if the map reached the max size It'll copy itself to an other map, reset and continue. I am not sure It'll be thread safe
Create a wrapper function for the put() method of ConcurrentHashMap which is synchronized. I believe I'll loose any advantage of using a ConcurrentHashMap
Use a ArrayListBlockingQueue with my batch size as size. It'll block when it will be full, but I'll need something to process it later.
Something else I didn't think of.
I am basically self taught regarding Java Threads and I'm looking for suggestions and ways to tackle my problem.
I still don't really understand the doors analogy. It sounds to me like you simply need a BlockingQueue:
"Somebody dropping a package in the room" is just a call to queue.offer(obj)
"Shipping the items away" is just a consumer thread taking S items from the queue, and then doing something with those objects:
while (true) {
Object[] objs = new Object[S];
for (int i = 0; i < S; ++i) {
objs[i] = queue.take(); // Perhaps with a timeout?
}
doSomethingWithObjects(objs);
}
In this way, you can keep on offering items to the queue ("keeping the doors open") while the consumer thread is processing them (provided you create the queue with sufficient capacity).

Lucene: Multithread document duplication

I have multiple threads which perform search in the lucene index. Before each search, there is a check whether the content is already indexed and if not it is then added to the index. If two parallel searches on unindexed content occure at the same time, there will be duplicated documents and guess the results of the search will be messed up.
I have found the following method: IndexWriter.updateDocument
but I think this does not solve the multithread problem I am facing.
Any suggestions how to resolve this are appreciated.
First Make sure there is only one method(IndexWriter#updateDocument()) call call at a time, you would to achieve it with a shared object belong to your threads, like this
class Search implements Runnable{
private Object lock=new Object();
private volatile boolean found=false;
public void run(){
//business
if(<<found something!>> && !found){
synchronized(lock){/*call the related-method*/found=true;}
}
//business
}
}
Second you need to track every keys have found during the search to avoid duplication, maybe checking the key or using a simple boolean check.
and please beware of useless process by signalling another threads about aborting their process for searching, IF you just need the very first founded keys, it's dependents on business.
If you're not able to modify the source of your updates/additions to be smarter about avoiding duplicates, then you'll have to create a choke point somewhere. The goal is simply to do it with the least amount of contention possible.
One way to do it would be to have a request queue, a work queue and a ConcurrentHashMap for lookups. All new requests are added to the request queue which is processed by a single "gatekeeper" thread. The gatekeeper can take one request at a time or drain the queue and process all pending requests in a loop to reduce contention on that end.
In order to process a request, the gatekeeper does putIfAbsent on the ConcurrentHashMap. If the return value is null, the update/insert request can be added to the actual work queue. If the value was already in the map, then.... see #2 below. Realistically you could use more than 1 gatekeeper since putIfAbsent is atomic, but it'd just increase contention on the HashMap. The gatekeeper's actual processing time is so low that you don't really gain anything by throwing more of them at the request queue.
The work queue threads will be able to process multiple updates/insertions concurrently as long as they don't modify the same record. When the work queue threads finish processing a request, they remove the value from the ConcurrentHashMap so that the gatekeeper knows it's safe to modify that record again.
--
Some things to think about:
1) How do you want to define what can be done simultaneously? It probably shouldn't be hashing the full request because you wouldn't want two different requests to modify the same document at the same time, would you?
2) What do you do with requests that cannot currently be processed because they have duplicates in the queue already (or requests that modify the same doc, as in point #1)? Throw them out? Put them in a secondary updating queue that tries again periodically? How do you respond to the original requester if its request is in an indefinite holding pattern?
3) Does the order in which requests are processed matter?

Using a LinkedBlockingQueue and flush to mysql

Would a linkedblockingqueue be suitable for the following:
1. insert strings (maximum 1024 bytes) into the queue at a very high rate
2. every x inserts or based on a timed interval, flush items into mysql
During the flush, I was looking at the API: http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/LinkedBlockingQueue.html
At was wondering it drainTo would be a good choice, since I have to aggregate before flushing.
So I would drainTo the items in the queue, then iterate and aggreate and then write to mysql.
Will this be suitable for upto 10K writers per second?
Do I need to consider any locking/synchronization issues or is that taken care of already?
I will store this linkedblockingqueue as the value in a concurrenthashmap.
Items will never be removed from the hashmap, only inserted if not present, and if present, I will append to the queue.
It depends a bit if the inserter is per queue or for all queues. If I am understanding your spec, I would think something like the following would work.
Writer adds an item to the one of the LinkedBlockingQueue collections in your map. If the size of the queue is more than X (if you want it per queue) then it signals the MySQL inserter thread. Something like this should work:
queue.add(newItem);
// race conditions here that may cause multiple signals but that's ok
if (queue.size() > 1000) {
// this will work if there is 1 inserter per queue
synchronized (queue) {
queue.notify();
}
}
...
Then the inserter is waiting on the queue and in something like the following loop:
List insertList = new ArrayList();
while (!done) {
synchronized (queue) {
// typically this would be while but if we are notified or timeout we insert
if (queue.size() < 1000) {
queue.wait(MILLIS_TIME_INTERVAL);
}
}
queue.drainTo(insertList);
// insert them into the db
insertList.clear();
}
It gets a bit more complicated if there 1 one thread doing the inserts across all queues. I guess the question is then why do you have the ConcurrentHashMap at all? If you do have 1 inserter which, for example, is inserting into multiple tables or something then you will need a mechanism to inform the insert which queue(s) need to be drained. It could just run through all of the queues in the map but that might be expensive. You would synchronize on some global lock object or maybe the map object instead of the queue.
Oh, and as #Peter Lawrey mentioned, you will quickly run out of memory if your database is slower than the writers so make sure the queues have a proper capacity set so they limit the writers and keep the working memory down.
Hope this helps.
For every queue you need a thread and a connection, so I wouldn't create too many queues. You can perform over 10K writes per second provided your MySQL server can handle this (you will only know when you test it) LinkedBlockingQueue is thread safe, and provide you have all your queues created before you start you don't need any locking/synchronization.
If you are inserting long Strings up to 1024 characters at 10 K per second you are likely to run out of memory pretty fast. (up to 36 GB per hour) Instead I would have the database only insert new strings.

Queue implementation with blocked 'take()' but with eviction policy

Is there an implementation with a blocking queue for take but bounded by a maximum size. When the size of the queue reaches a given max-size, instead of blocking 'put', it will remove the head element and insert it. So put is not blocked() but take() is.
One usage is that if I have a very slow consumer, the system will not crash ( runs out of memory ) rather these message will be removed but I do not want to block the producer.
An example of this would stock trading system. When you get a spike in stock trade/quote data, if you haven't consumed data, you want to automatically throw away old stock trade/quote.
There currently isnt in Java a thread-safe queue that will do what you are looking for. However, there is a BlockingDequeue (Double Ended Queue) that you can write a wrapper in which you can take from the head and and tail as you see freely.
This class, similar to a BlockingQueue, is thread safe.
Several strategies are provided in ThreadPoolExecutor. Search for "AbortPolicy" in this javadoc . You can also implement your own policy if you want. Perhaps Discard is similar to what you want. Personally I think CallerRuns is what you want in most cases.
I think using these is a better solution, but if you absolutely want to implement it at the queue, I'd probably do it by composition. Perhaps use a LinkedList or something and wrap it with synchronize keyword.
EDIT:(some clarifications..)
"Executor" is basically a thread pool combined with a blocking queue. It is the recommended way to implement a producer/consumer pattern in java. The authors of these libraries provides several strategies to cope with issues like you mentioned. If you are interested, here is another approach to specifically address the OOME issue (the source is framework specific and can't be used as is).

Categories