ConcurrencyHashMap and BashProcessing - java

First some abstraction. My problem can be modelized as the following :
I have a room with N doors, and anyone can drop package in the room. Once there is a given number of package in the room, I want them to be shipped away while keeping the doors open.
Using Java 1.8, I'm working on a multi-threaded application where any thread can add items to my ConcurrentHashMap object.
I want to regularly dump my ConcurrentHashMap when it reaches a certain size without blocking the threads adding items to the Map. Dumping including several operations which are costly.
I thought of the following solutions :
Check the size of the HashMap each time I add something, and if the map reached the max size It'll copy itself to an other map, reset and continue. I am not sure It'll be thread safe
Create a wrapper function for the put() method of ConcurrentHashMap which is synchronized. I believe I'll loose any advantage of using a ConcurrentHashMap
Use a ArrayListBlockingQueue with my batch size as size. It'll block when it will be full, but I'll need something to process it later.
Something else I didn't think of.
I am basically self taught regarding Java Threads and I'm looking for suggestions and ways to tackle my problem.

I still don't really understand the doors analogy. It sounds to me like you simply need a BlockingQueue:
"Somebody dropping a package in the room" is just a call to queue.offer(obj)
"Shipping the items away" is just a consumer thread taking S items from the queue, and then doing something with those objects:
while (true) {
Object[] objs = new Object[S];
for (int i = 0; i < S; ++i) {
objs[i] = queue.take(); // Perhaps with a timeout?
}
doSomethingWithObjects(objs);
}
In this way, you can keep on offering items to the queue ("keeping the doors open") while the consumer thread is processing them (provided you create the queue with sufficient capacity).

Related

Java Splitting Up Work Between Frames

First I'll explain what I want to do and afterwords I'll provide a proposed solution.
Problem
I'm running a game where I want to do a certain amount of work every frame. For example, I have N objects that are in a queue waiting to be initialized (imagine initialization is a fairly expensive operation and N is large) and after adding them all, I want to create their collision boxes, and after that, I want to merge them together to limit render calls. I can't do these operations on a different thread because all this stuff is heavily coupled with the game world. But I want to split up all these operations into bite-size chunks to run each frame so that there is minimal lag (framerate dips). How would I go about doing this?
Proposed Solution
It would be nice to have a function that can stop after one call and continue where it left off after calling it again:
For example,
boolean loadEverything() {
for (int i = 0; i < objectsToAdd.length; i++) {
world.add(objectsToAdd[i]);
if (i % 10 == 0) {
return stop();
}
}
makeCollision();
return stop();
mergeObjects();
return true;
}
Calling loadEverything() the first objectsToAdd/10 times adds 10 objects to the game world at a time. Then calling it after should run makeCollision() and then stop. Calling it again runs mergeObjects() and then the function returns true. In the caller function I would run loadEverything() until it returns true.
I'm aware of yeild-return, yield-break implementations like those described here exist, but I'm wondering if there's a more general implementation of them, or that maybe a better solution exists that doesn't require any extra dependencies.
Do you look at Coroutine yet? There's native implementation in Kotlin but in Java there're options here and here.
By all means we need to make sure those OpenGL or Box2D operations that required to be in main thread should be in main thread, as I believe coroutine will be created under a new thread. So there might not be gain to split works for those kind of operations.
Another option
You say you need to split works in creating objects in run time. Can you predict or estimate the number of objects you would want before hand? and so if you don't really need to dynamically create object like that, I suggest to look at Object Pool in libgdx (see more here). That link has working example to use Pool in your game.
Such Pool already have initialized objects ready to be grabbed and used on-demand, also can grow if need in run time, so initially if you can provide a good estimation of number of objects you intend to use, it's all good.
Why don't you add one static variable which would keep it's value between function calls? Then you can loop from current value to current value + 10, increase current value (that static variable) by 10 and exit.

How to implement atomic request counter

I'm confronted to the following problem :
I've implemented a crawler, and I would like to know how many requests have been done during the last second, and what amount of data has been downloaded during the last second.
Currently, I've implemented it using locks. My version uses a queue, and two counters (count and sum).
When a task is done, I just increase my counters, and I add an event (with the current date) to the queue
When wanting to get the value of my counters, I check if some stuff in the queue is more than 1second old. If so, I dequeue it and decrease my counters properly. Then, I return the wanted result.
This version works well but I would like, for a training purpose, to reimplement it using atomic operations instead of locks. Nevertheless, I' ve to admit that I'm stuck on the "cleaning operation". (dequeuing of old values)
So, is this the good aproach to implement this ?
Which other approach could I use ?
Thanks !
This version works well but I would like, for a training purpose, to reimplement it using atomic operations instead of locks.
If you need to make multiple changes to the data when the roll period happens, you will need to lock otherwise you will have problems. Any time you have multiple "atomic operations" you need to have a lock to protect against race conditions. For example, in your case, what if something else was added to the queue while you were doing your roll?
Which other approach could I use ?
I'm not 100% sure why you need to queue up the information. If you only are counting the number of requests and the total of the data size downloaded then you should be able to use a single AtomicReference<CountSum>. The CountSum class would store your two values. Then when someone needs to increment it they would do something like:
CountSum newVal = new CountSum();
do {
CountSum old = countSumRef.get();
newVal.setCount(old.getCount() + 1);
newVal.setSum(old.getSum() + requestDataSize);
// we need to loop here if someone changed the value behind our back
} while (!countSumRef.compareAndSet(old, newVal));
This ensures that your count and your sum are always in sync. If you used two AtomicLong variables, you'd have to make two atomic requests and would need the lock again.
When you want to reset the values, you'd do the same thing.
CountSum newVal = new CountSum(0, 0);
CountSum old;
do {
old = countSumRef.get();
// we need to loop here if someone changed the value behind our back
} while (!countSumRef.compareAndSet(old, newVal));
// now you can display the old value and be sure you got everything

Detemining queue size

I have to implement a queue to which object will be added and removed by two different threads at different time based on some factor.My problem is the requirement says the queue( whole queue and data it hold) should not take 200KB+ data .If size is 200 thread should wait for space to be available to push more data.Object pushed may vary in size.I can create java queue obut the size of queue will return the total object pushed instead of total memory used How do i determine the totla size of data my queue is refering to .
Consider the object pushed as
class A{
int x;
byte[] buf;//array size vary per object
}
There is no out of the box functionality for this in Java. (In part, because there is no easy way to know if the objects added to the collection are referenced elsewhere and therefore if adding them takes up additional memory.)
For your use case, you would probably be best of just subclassing queue. Override the super to add the size of the object to a counter (obviously you will have to make this calculation thread safe.) and to throw an exception IllegalStateException if it doesn't have room. Similarly decrement your counter if on an overridden remove class.
The method of determining how to much space to add to the counter could vary. Farlan suggested using this and that looks like it would work. But since you are suggesting that you are dealing with a byte array, the size of the data you are adding might already be known to you. You will also have to consider whether you want to consider any of the overhead. The object takes some space, as does the reference inside of the queue itself. Plus the queue object. You could figure out exact values for that, but since it seems like your requirement is just to prevent outofmemory, you could probably just use rough estimates for those as long as you are consistent.
The details of what queue class you want to subclass may depend on how much contention you think there will be between the threads. But it sounds like you have a handle on the sync issues.

Java bounded non-blocking buffer for high concurrent situation

Basically I need a data structure to store the temporary chatting messages on the server side. It should be:
bounded: because I don't need store too many messages, the client will send request to get the new messages every second. I think the bound size should be the max. mount of concurrent requests in one second. When the buffer is full, the old messages will be removed.
suitable for high concurrent access: I don't want to use the data structure like Collections.synchronizedXXXX, because during the iteration, if other thread changes the data structure, e.g. adds a message, it will throw an Exception, so I have to lock the whole data structure, actually I don't really care if the client request can get the most last inserted message, because they will send a new request after one second, on the other side the write operation should be never delayed. The classes under the package java.util.concurrency seems the solution, but...
non-blocking: LinkedBlockingQueue, ArrayBlockingQueue they could be bounded and won't throw exception during iteration, but they are all blocking queue. When the queue is full, I want to add the new element to the tails and remove the old element from head instead of blocking there and wait for someone to remove the header.
So my question is there any good implementation from 3rd library? For example Google Guava?
Or maybe you have better idea about storing the temporary chatting messages on server?
thank you very much!
You can use LinkedBlockingQueue with the non-blocking methods offer (or add) and poll to access it.
You can create it with a fixed capacity to make it bounded.
LinkedBlockingQueue<String> myStrings = new LinkedBlockingQueue<String>(100);
myStrings.offer("Hi!"); // returns false if limit is reached
myStrings.add("Hi, again!"); // throws exception if limit is reached
String s = myStrings.poll(); // returns null if queue is empty
You could utilize the Apache Commons CircularFifoBuffer. It meets your first and last criteria. To support concurrency, you can wrap the base buffer in it's synchronized version like so:
Buffer fifo = BufferUtils.synchronizedBuffer(new CircularFifoBuffer());
Good luck on the project.
Did you take a look at ConcurrentLinkedQueue?
The page says
This implementation employs an efficient "wait-free" algorithm...
Wait-freedom is one of the strongest guarantee you can obtain....
You can add non-blocking behaviour to an ArrayBlockingQueue by surrounding it with a conditional offer() statement, where failure of the queue to accept the offer results in the head being dropped and the offer being re-made:
public class LearnToQueue {
public static void main(String[] args){
Queue<Integer> FIFO = new ArrayBlockingQueue<Integer>(4);
int i = 0;
while ( i < 10 ){
if (!FIFO.offer(i)){
// You can pipe the head of the queue anywhere you want to
FIFO.remove();
FIFO.offer(i);
}
System.out.println(FIFO.toString());
i++;
}
}
}
LinkedTransferQueue is a blocking, unbounded queue that doesn't enforce strict FIFO ordering. It will only block when taking from an empty queue, but never on adding to one. You could add a soft cap to evict elements by adding either a size or read & write counters.
Depending on your requirements, you may be able to write a custom lock-free ring buffer.

Using a LinkedBlockingQueue and flush to mysql

Would a linkedblockingqueue be suitable for the following:
1. insert strings (maximum 1024 bytes) into the queue at a very high rate
2. every x inserts or based on a timed interval, flush items into mysql
During the flush, I was looking at the API: http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/LinkedBlockingQueue.html
At was wondering it drainTo would be a good choice, since I have to aggregate before flushing.
So I would drainTo the items in the queue, then iterate and aggreate and then write to mysql.
Will this be suitable for upto 10K writers per second?
Do I need to consider any locking/synchronization issues or is that taken care of already?
I will store this linkedblockingqueue as the value in a concurrenthashmap.
Items will never be removed from the hashmap, only inserted if not present, and if present, I will append to the queue.
It depends a bit if the inserter is per queue or for all queues. If I am understanding your spec, I would think something like the following would work.
Writer adds an item to the one of the LinkedBlockingQueue collections in your map. If the size of the queue is more than X (if you want it per queue) then it signals the MySQL inserter thread. Something like this should work:
queue.add(newItem);
// race conditions here that may cause multiple signals but that's ok
if (queue.size() > 1000) {
// this will work if there is 1 inserter per queue
synchronized (queue) {
queue.notify();
}
}
...
Then the inserter is waiting on the queue and in something like the following loop:
List insertList = new ArrayList();
while (!done) {
synchronized (queue) {
// typically this would be while but if we are notified or timeout we insert
if (queue.size() < 1000) {
queue.wait(MILLIS_TIME_INTERVAL);
}
}
queue.drainTo(insertList);
// insert them into the db
insertList.clear();
}
It gets a bit more complicated if there 1 one thread doing the inserts across all queues. I guess the question is then why do you have the ConcurrentHashMap at all? If you do have 1 inserter which, for example, is inserting into multiple tables or something then you will need a mechanism to inform the insert which queue(s) need to be drained. It could just run through all of the queues in the map but that might be expensive. You would synchronize on some global lock object or maybe the map object instead of the queue.
Oh, and as #Peter Lawrey mentioned, you will quickly run out of memory if your database is slower than the writers so make sure the queues have a proper capacity set so they limit the writers and keep the working memory down.
Hope this helps.
For every queue you need a thread and a connection, so I wouldn't create too many queues. You can perform over 10K writes per second provided your MySQL server can handle this (you will only know when you test it) LinkedBlockingQueue is thread safe, and provide you have all your queues created before you start you don't need any locking/synchronization.
If you are inserting long Strings up to 1024 characters at 10 K per second you are likely to run out of memory pretty fast. (up to 36 GB per hour) Instead I would have the database only insert new strings.

Categories