I'm confronted to the following problem :
I've implemented a crawler, and I would like to know how many requests have been done during the last second, and what amount of data has been downloaded during the last second.
Currently, I've implemented it using locks. My version uses a queue, and two counters (count and sum).
When a task is done, I just increase my counters, and I add an event (with the current date) to the queue
When wanting to get the value of my counters, I check if some stuff in the queue is more than 1second old. If so, I dequeue it and decrease my counters properly. Then, I return the wanted result.
This version works well but I would like, for a training purpose, to reimplement it using atomic operations instead of locks. Nevertheless, I' ve to admit that I'm stuck on the "cleaning operation". (dequeuing of old values)
So, is this the good aproach to implement this ?
Which other approach could I use ?
Thanks !
This version works well but I would like, for a training purpose, to reimplement it using atomic operations instead of locks.
If you need to make multiple changes to the data when the roll period happens, you will need to lock otherwise you will have problems. Any time you have multiple "atomic operations" you need to have a lock to protect against race conditions. For example, in your case, what if something else was added to the queue while you were doing your roll?
Which other approach could I use ?
I'm not 100% sure why you need to queue up the information. If you only are counting the number of requests and the total of the data size downloaded then you should be able to use a single AtomicReference<CountSum>. The CountSum class would store your two values. Then when someone needs to increment it they would do something like:
CountSum newVal = new CountSum();
do {
CountSum old = countSumRef.get();
newVal.setCount(old.getCount() + 1);
newVal.setSum(old.getSum() + requestDataSize);
// we need to loop here if someone changed the value behind our back
} while (!countSumRef.compareAndSet(old, newVal));
This ensures that your count and your sum are always in sync. If you used two AtomicLong variables, you'd have to make two atomic requests and would need the lock again.
When you want to reset the values, you'd do the same thing.
CountSum newVal = new CountSum(0, 0);
CountSum old;
do {
old = countSumRef.get();
// we need to loop here if someone changed the value behind our back
} while (!countSumRef.compareAndSet(old, newVal));
// now you can display the old value and be sure you got everything
Related
First I'll explain what I want to do and afterwords I'll provide a proposed solution.
Problem
I'm running a game where I want to do a certain amount of work every frame. For example, I have N objects that are in a queue waiting to be initialized (imagine initialization is a fairly expensive operation and N is large) and after adding them all, I want to create their collision boxes, and after that, I want to merge them together to limit render calls. I can't do these operations on a different thread because all this stuff is heavily coupled with the game world. But I want to split up all these operations into bite-size chunks to run each frame so that there is minimal lag (framerate dips). How would I go about doing this?
Proposed Solution
It would be nice to have a function that can stop after one call and continue where it left off after calling it again:
For example,
boolean loadEverything() {
for (int i = 0; i < objectsToAdd.length; i++) {
world.add(objectsToAdd[i]);
if (i % 10 == 0) {
return stop();
}
}
makeCollision();
return stop();
mergeObjects();
return true;
}
Calling loadEverything() the first objectsToAdd/10 times adds 10 objects to the game world at a time. Then calling it after should run makeCollision() and then stop. Calling it again runs mergeObjects() and then the function returns true. In the caller function I would run loadEverything() until it returns true.
I'm aware of yeild-return, yield-break implementations like those described here exist, but I'm wondering if there's a more general implementation of them, or that maybe a better solution exists that doesn't require any extra dependencies.
Do you look at Coroutine yet? There's native implementation in Kotlin but in Java there're options here and here.
By all means we need to make sure those OpenGL or Box2D operations that required to be in main thread should be in main thread, as I believe coroutine will be created under a new thread. So there might not be gain to split works for those kind of operations.
Another option
You say you need to split works in creating objects in run time. Can you predict or estimate the number of objects you would want before hand? and so if you don't really need to dynamically create object like that, I suggest to look at Object Pool in libgdx (see more here). That link has working example to use Pool in your game.
Such Pool already have initialized objects ready to be grabbed and used on-demand, also can grow if need in run time, so initially if you can provide a good estimation of number of objects you intend to use, it's all good.
Why don't you add one static variable which would keep it's value between function calls? Then you can loop from current value to current value + 10, increase current value (that static variable) by 10 and exit.
First some abstraction. My problem can be modelized as the following :
I have a room with N doors, and anyone can drop package in the room. Once there is a given number of package in the room, I want them to be shipped away while keeping the doors open.
Using Java 1.8, I'm working on a multi-threaded application where any thread can add items to my ConcurrentHashMap object.
I want to regularly dump my ConcurrentHashMap when it reaches a certain size without blocking the threads adding items to the Map. Dumping including several operations which are costly.
I thought of the following solutions :
Check the size of the HashMap each time I add something, and if the map reached the max size It'll copy itself to an other map, reset and continue. I am not sure It'll be thread safe
Create a wrapper function for the put() method of ConcurrentHashMap which is synchronized. I believe I'll loose any advantage of using a ConcurrentHashMap
Use a ArrayListBlockingQueue with my batch size as size. It'll block when it will be full, but I'll need something to process it later.
Something else I didn't think of.
I am basically self taught regarding Java Threads and I'm looking for suggestions and ways to tackle my problem.
I still don't really understand the doors analogy. It sounds to me like you simply need a BlockingQueue:
"Somebody dropping a package in the room" is just a call to queue.offer(obj)
"Shipping the items away" is just a consumer thread taking S items from the queue, and then doing something with those objects:
while (true) {
Object[] objs = new Object[S];
for (int i = 0; i < S; ++i) {
objs[i] = queue.take(); // Perhaps with a timeout?
}
doSomethingWithObjects(objs);
}
In this way, you can keep on offering items to the queue ("keeping the doors open") while the consumer thread is processing them (provided you create the queue with sufficient capacity).
I'm making a series of connections asynchronously via MySQL, and I have a class which contains a bunch of easy-accesible static methods to update/remove/clear/get/etc data.
The issue I'm confronted with is that the getter methods won't return the proper value (practically ever) because they are returned before the async connection gets a chance to update the value to be returned.
Example:
public static int getSomething(final UUID user)
{
Connection c = StatsMain.getInstance().getSQL().getConnection();
PreparedStatement ps;
try
{
ps = c.prepareStatement("select something from stats where uuid=?");
ps.setString(1, user.toString());
ResultSet result = ps.executeQuery();
return result.getInt("something");
}
catch (SQLException e)
{
return false;
}
}
(Not copy & pasted, but pretty close)
I realize I can use a 'callback' effect by passing an interface to the method and doing such, but that becomes very tedious when the database stores 10 values for a key.
Sounds like you're looking for Futures since Java 6 or CompletableFuture, which is new in Java 8
Solution 1:
The best method I've come up with is have a thread with a loop in it that waits for MySQL to return values and responds to each value. This is rather like the callback in the get routine, but you only have the one loop. Of course, the loop has to know what to do with each possible returned piece of data.
This means rethinking a bit how your program works. Instead of: ask a question, get an answer, use the answer, you have two completely independent operations. The first is: ask a question, then forget about it. The second is: get an answer, then, knowing nothing about the question, use the answer. It's a completely different approach, and you need to get your head around it before using it.
(One possible further advantage of this approach is that MySQL end can now send data without being prompted. You have the option of feeding changes made by another user to your user in real time.)
Solution 2:
My other solution is simpler in some ways, but it can have you firing off lots of threads. Just have your getSomething method block until it has the answer and returns. To keep your program from hanging, just put the whole block of code that calls the method in its own thread.
Hybrid:
You can use both solutions together. The first one makes for cleaner code, but the second lets you answer a specific question when you get the reply. (If you get a "Customer Name" from the DB, and you have a dozen fields it could go in, it might help to know that you did ask for this field specifically, and that you asked because the user pushed a button to put the value in a specific text box on the screen.)
Lastly:
You can avoid a lot of multithreading headaches by using InvokeLater to put all changes to your data structures on your EventQueue. This can nicely limit the synchronization problems. (On the other hand, having 20 or 30 threads going at once can make good use of all your computer's cores, if you like to live dangerously.)
You may want to stick with synchronized calls, but if you do want to go asynchronous, this is how I'd do it. It's not too bad once you get some basic tools written and get your brain to stop thinking synchronously.
I have multiple threads which perform search in the lucene index. Before each search, there is a check whether the content is already indexed and if not it is then added to the index. If two parallel searches on unindexed content occure at the same time, there will be duplicated documents and guess the results of the search will be messed up.
I have found the following method: IndexWriter.updateDocument
but I think this does not solve the multithread problem I am facing.
Any suggestions how to resolve this are appreciated.
First Make sure there is only one method(IndexWriter#updateDocument()) call call at a time, you would to achieve it with a shared object belong to your threads, like this
class Search implements Runnable{
private Object lock=new Object();
private volatile boolean found=false;
public void run(){
//business
if(<<found something!>> && !found){
synchronized(lock){/*call the related-method*/found=true;}
}
//business
}
}
Second you need to track every keys have found during the search to avoid duplication, maybe checking the key or using a simple boolean check.
and please beware of useless process by signalling another threads about aborting their process for searching, IF you just need the very first founded keys, it's dependents on business.
If you're not able to modify the source of your updates/additions to be smarter about avoiding duplicates, then you'll have to create a choke point somewhere. The goal is simply to do it with the least amount of contention possible.
One way to do it would be to have a request queue, a work queue and a ConcurrentHashMap for lookups. All new requests are added to the request queue which is processed by a single "gatekeeper" thread. The gatekeeper can take one request at a time or drain the queue and process all pending requests in a loop to reduce contention on that end.
In order to process a request, the gatekeeper does putIfAbsent on the ConcurrentHashMap. If the return value is null, the update/insert request can be added to the actual work queue. If the value was already in the map, then.... see #2 below. Realistically you could use more than 1 gatekeeper since putIfAbsent is atomic, but it'd just increase contention on the HashMap. The gatekeeper's actual processing time is so low that you don't really gain anything by throwing more of them at the request queue.
The work queue threads will be able to process multiple updates/insertions concurrently as long as they don't modify the same record. When the work queue threads finish processing a request, they remove the value from the ConcurrentHashMap so that the gatekeeper knows it's safe to modify that record again.
--
Some things to think about:
1) How do you want to define what can be done simultaneously? It probably shouldn't be hashing the full request because you wouldn't want two different requests to modify the same document at the same time, would you?
2) What do you do with requests that cannot currently be processed because they have duplicates in the queue already (or requests that modify the same doc, as in point #1)? Throw them out? Put them in a secondary updating queue that tries again periodically? How do you respond to the original requester if its request is in an indefinite holding pattern?
3) Does the order in which requests are processed matter?
I know if two threads are writing to the same place I need to make sure they do it in a safe way and cause no problems but what if just one thread reads and does all the writing while another just reads.
In my case I'm using a thread in a small game for the 1st time to keep the updating apart from the rendering. The class that does all the rendering will never write to anything it reads, so I am not sure anymore if I need handle every read and write for everything they both share.
I will take the right steps to make sure the renderer does not try to read anything that is not there anymore but when calling things like the player and entity's getters should I be treating them in the same way? or would setting the values like x, y cords and Booleans like "alive" to volatile do the trick?
My understanding has become very murky on this and could do with some enlightening
Edit: The shared data will be anything that needs to be drawn and moved and stored in lists of objects.
For example the player and other entity's;
With the given information it is not possible to exactly specify a solution, but it is clear that you need some kind of method to synchronize between the threads. The issue is that as long as the write operations are not atomic that you could be reading data at the moment that it is being updates. This means that you for instance get an old y-coordinate with a new x-coordinate.
Basically you only do not need to worry about synchronization if both threads are only reading the information or - even better - if all the data structures are immutable (so both threads can not modify the objects). The best way to proceed is to think about which operations need to be atomic first, and then create a solution to make the operations atomic.
Don't forget: get it working, get it right, get it optimized (in that order).
You could have problems in this case if list's sizes are variable and you don't synchronize the access to them, consider this:
read-only thread reads mySharedList size and it sees it is 15; at that moment its CPU time finishes and read-write thread is given the CPU
read-write thread deletes an element from the list, now its size is 14.
read-only thread is again granted CPU time, it tries to read the last element using the (now obsolete) size it read before being interrupted, you'll have an Exception.