Would a linkedblockingqueue be suitable for the following:
1. insert strings (maximum 1024 bytes) into the queue at a very high rate
2. every x inserts or based on a timed interval, flush items into mysql
During the flush, I was looking at the API: http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/LinkedBlockingQueue.html
At was wondering it drainTo would be a good choice, since I have to aggregate before flushing.
So I would drainTo the items in the queue, then iterate and aggreate and then write to mysql.
Will this be suitable for upto 10K writers per second?
Do I need to consider any locking/synchronization issues or is that taken care of already?
I will store this linkedblockingqueue as the value in a concurrenthashmap.
Items will never be removed from the hashmap, only inserted if not present, and if present, I will append to the queue.
It depends a bit if the inserter is per queue or for all queues. If I am understanding your spec, I would think something like the following would work.
Writer adds an item to the one of the LinkedBlockingQueue collections in your map. If the size of the queue is more than X (if you want it per queue) then it signals the MySQL inserter thread. Something like this should work:
queue.add(newItem);
// race conditions here that may cause multiple signals but that's ok
if (queue.size() > 1000) {
// this will work if there is 1 inserter per queue
synchronized (queue) {
queue.notify();
}
}
...
Then the inserter is waiting on the queue and in something like the following loop:
List insertList = new ArrayList();
while (!done) {
synchronized (queue) {
// typically this would be while but if we are notified or timeout we insert
if (queue.size() < 1000) {
queue.wait(MILLIS_TIME_INTERVAL);
}
}
queue.drainTo(insertList);
// insert them into the db
insertList.clear();
}
It gets a bit more complicated if there 1 one thread doing the inserts across all queues. I guess the question is then why do you have the ConcurrentHashMap at all? If you do have 1 inserter which, for example, is inserting into multiple tables or something then you will need a mechanism to inform the insert which queue(s) need to be drained. It could just run through all of the queues in the map but that might be expensive. You would synchronize on some global lock object or maybe the map object instead of the queue.
Oh, and as #Peter Lawrey mentioned, you will quickly run out of memory if your database is slower than the writers so make sure the queues have a proper capacity set so they limit the writers and keep the working memory down.
Hope this helps.
For every queue you need a thread and a connection, so I wouldn't create too many queues. You can perform over 10K writes per second provided your MySQL server can handle this (you will only know when you test it) LinkedBlockingQueue is thread safe, and provide you have all your queues created before you start you don't need any locking/synchronization.
If you are inserting long Strings up to 1024 characters at 10 K per second you are likely to run out of memory pretty fast. (up to 36 GB per hour) Instead I would have the database only insert new strings.
Related
There's a BlockingQueue (ArrayBlockingQueue in particular) which is accessed by the same number of threads offer and poll to/from that (5 threads are offering and 5 other threads are polling).
BlockingQueue myQueue = new ArrayBlockingQueue<>(1000);
The problem is that if say the offer frequency is every 1 ms, the poll frequency would be something near every 1500 ms (Because of processing time consumed by each polled object, in each loop interval). Having this situation, The Queue gets filled causing the new offer commands to fail and objects get dropped.
while (true) {
Object obj = myQueue.poll();
if (obj != null) {
// Process the object, takes ~ 1500 ms
}
}
What's the best approach (performance-oriented) to get this to work?
Should I increase the number of polling threads by x1500?
Maybe increasing the size of ArrayBlockingQueue or even replacing it with a ConcurrentLinkedQueue would be the solution?
Or just increasing the TimeOut in offer(TimeOut, TimeUnit) method in offer threads would relieve the issue?
Or a better way?....
It's important to note that the incoming offered objects are unstoppable, so objects shouldn't be accumulated in the queue. In conclusion, Speed and Performance are vital to prevent collapsing in this data stream.
Thanks
I need to create a list to do the following operation:
I receive an object from an external queue/topic every microsecond.
After doing some operations on the object, I need to persist these objects into database.
I am doing the persist in batches of 100 or 1000. The only problem is, persist rate is lower than the incoming message rate. Now I don't want to keep this in a single thread since the persist will slow down the message consumption.
My idea is to keep accepting the message objects and adding them to a collection (like a linked list)
And keep removing from the other end of the collection in batches of 100 or 1000 and persist into database.
What is the right collection to use? How to synchronize this and avoid concurrent modification exceptions?
Below is the code I'm trying to implement with an ArrayList that clears out the list every few seconds while persisting.
class myclass{
List persistList;
ScheduledExecutorService persistExecutor;
ScheduledFuture scheduledFuture;
PersistOperation persistOperation;
//Initialize delay, interval
void init(){
scheduledFuture=persistExecutor.scheduleAtFixedRate(new persistOperation(persistList), delay, interval, TimeUnit.SECONDS);
}
void execute(msg){
//process the message and add to the persist list
}
class PersistOperation implements Runnable{
List persistList
PersistOperation(List persistList){
//Parameterized constructor
}
run(){
//Copy persistList to new ArrayList and clear persistList
//entity manager persist/update/merge
}
}
}
And keep removing from the other end of the collection in batches of 100 or 1000 and persist into database.
This is reasonable so long as multiple threads poll from the collection.
Below is the code I'm trying to implement with an ArrayList
An ArrayList is a bad choice here, as it is not thread-safe and, when removing an element at index 0, every element to the right of it must be shifted over (an O(n) operation).
The collection that you're looking for is called a Deque, otherwise known as a double-ended queue. However, because you need the collection to be thread-safe, I recommend using a ConcurrentLinkedDeque.
I think that you will want to use the LMAX Disruptor framework here. I envision two RingBuffers. You would use the first to accept incoming messages. Your worker(s) would read from the RingBuffer. You would set the size of the RingBuffer to equal your persistence chunk size (eg 100 or 1000). After a worker takes an event from the RingBuffer and processes it, it places a reference to the persisted object into a Queue Collection. Each time the first RingBuffer has been circled once, you allocate a new Queue and place the old Queue into the second RingBuffer. The worker(s) for the second RingBuffer take a Queue object from the RingBuffer, persist all the objects in the Queue, and then move to the next queue. You can tune the size of the second RingBuffer and the worker threads to accommodate the speed at which the database can persist your chunks.
You risk losing messages with that approach, if you have 100 messages receive but not saved, and your application dies, can you afford to lose those messages?
The kind of topic/queue is important here, topics have the advantage of managing this backpressure control, queues are usually there because ordered processing is required.
If you queue/topic is kafka, and you pull messages, kafka can pull batches, and you probably can save batches to the database as well, an only ack the messages to kafka once saved.
If your processing needs to be ordered, you can probably handle some king of reactive approach and tune the db. A queue system can control the flow, usually.
First some abstraction. My problem can be modelized as the following :
I have a room with N doors, and anyone can drop package in the room. Once there is a given number of package in the room, I want them to be shipped away while keeping the doors open.
Using Java 1.8, I'm working on a multi-threaded application where any thread can add items to my ConcurrentHashMap object.
I want to regularly dump my ConcurrentHashMap when it reaches a certain size without blocking the threads adding items to the Map. Dumping including several operations which are costly.
I thought of the following solutions :
Check the size of the HashMap each time I add something, and if the map reached the max size It'll copy itself to an other map, reset and continue. I am not sure It'll be thread safe
Create a wrapper function for the put() method of ConcurrentHashMap which is synchronized. I believe I'll loose any advantage of using a ConcurrentHashMap
Use a ArrayListBlockingQueue with my batch size as size. It'll block when it will be full, but I'll need something to process it later.
Something else I didn't think of.
I am basically self taught regarding Java Threads and I'm looking for suggestions and ways to tackle my problem.
I still don't really understand the doors analogy. It sounds to me like you simply need a BlockingQueue:
"Somebody dropping a package in the room" is just a call to queue.offer(obj)
"Shipping the items away" is just a consumer thread taking S items from the queue, and then doing something with those objects:
while (true) {
Object[] objs = new Object[S];
for (int i = 0; i < S; ++i) {
objs[i] = queue.take(); // Perhaps with a timeout?
}
doSomethingWithObjects(objs);
}
In this way, you can keep on offering items to the queue ("keeping the doors open") while the consumer thread is processing them (provided you create the queue with sufficient capacity).
Basically I need a data structure to store the temporary chatting messages on the server side. It should be:
bounded: because I don't need store too many messages, the client will send request to get the new messages every second. I think the bound size should be the max. mount of concurrent requests in one second. When the buffer is full, the old messages will be removed.
suitable for high concurrent access: I don't want to use the data structure like Collections.synchronizedXXXX, because during the iteration, if other thread changes the data structure, e.g. adds a message, it will throw an Exception, so I have to lock the whole data structure, actually I don't really care if the client request can get the most last inserted message, because they will send a new request after one second, on the other side the write operation should be never delayed. The classes under the package java.util.concurrency seems the solution, but...
non-blocking: LinkedBlockingQueue, ArrayBlockingQueue they could be bounded and won't throw exception during iteration, but they are all blocking queue. When the queue is full, I want to add the new element to the tails and remove the old element from head instead of blocking there and wait for someone to remove the header.
So my question is there any good implementation from 3rd library? For example Google Guava?
Or maybe you have better idea about storing the temporary chatting messages on server?
thank you very much!
You can use LinkedBlockingQueue with the non-blocking methods offer (or add) and poll to access it.
You can create it with a fixed capacity to make it bounded.
LinkedBlockingQueue<String> myStrings = new LinkedBlockingQueue<String>(100);
myStrings.offer("Hi!"); // returns false if limit is reached
myStrings.add("Hi, again!"); // throws exception if limit is reached
String s = myStrings.poll(); // returns null if queue is empty
You could utilize the Apache Commons CircularFifoBuffer. It meets your first and last criteria. To support concurrency, you can wrap the base buffer in it's synchronized version like so:
Buffer fifo = BufferUtils.synchronizedBuffer(new CircularFifoBuffer());
Good luck on the project.
Did you take a look at ConcurrentLinkedQueue?
The page says
This implementation employs an efficient "wait-free" algorithm...
Wait-freedom is one of the strongest guarantee you can obtain....
You can add non-blocking behaviour to an ArrayBlockingQueue by surrounding it with a conditional offer() statement, where failure of the queue to accept the offer results in the head being dropped and the offer being re-made:
public class LearnToQueue {
public static void main(String[] args){
Queue<Integer> FIFO = new ArrayBlockingQueue<Integer>(4);
int i = 0;
while ( i < 10 ){
if (!FIFO.offer(i)){
// You can pipe the head of the queue anywhere you want to
FIFO.remove();
FIFO.offer(i);
}
System.out.println(FIFO.toString());
i++;
}
}
}
LinkedTransferQueue is a blocking, unbounded queue that doesn't enforce strict FIFO ordering. It will only block when taking from an empty queue, but never on adding to one. You could add a soft cap to evict elements by adding either a size or read & write counters.
Depending on your requirements, you may be able to write a custom lock-free ring buffer.
i have the following situation:
Read data from database
do work "calculation"
write result to database
I have a thread that reads from the database and puts the generated objects into a BlockingQueue. These objects are extremely heavy weight hence the queue to limit amount of objects in memory.
A multiple threads take objects from the Queue, performs work and put the results in a second queue.
The final thread takes results from second queue and saves result to database.
The problem is how to prevent deadlocks, eg. the "calculation threads" need to know when no more objects will be put into the queue.
Currently I achieve this by passing a references of the threads (callable) to each other and checking thread.isDone() before a poll or offer and then if the element is null. I also check the size of the queue, as long as there are elements in it, the must be consumed. Using take or put leads to deadlocks.
Is there a simpler way to achieve this?
One of the ways to accomplish would be to put a "dummy" or "poison" message as the last message on the queue when you are sure that no more tasks are going to arrive on the queue.. for example after putting the message related to the last row of the db query. So the producer puts a dummy message on the queue, the consumer on receiving this dummy message knows that no more meaningful work is expected in this batch.
Maybe you should take a look at CompletionService
It is designed to combine executor and a queue functionality in one.
Tasks which completed execution will be available from the completions service via
completionServiceInstance.take()
You can then again use another executor for 3. i.e. fill DB with the results, which you will feed with the results taken from the completionServiceInstance.