I want to run two XPath-Expressions concurrently on two revisions of a database which both return results from an Iterator/Iterable and match resulting nodes with nodes in a List.
I think the best thing is to run both queries in two threads from an executorservice and save results from both threads in a BlockingQueue, whereas another Thread is going to sort the results from the BlockingQueue or actually saves the incoming nodes or nodeKeys in the right position.
Then it's trivial to get the intersection of the resulting sorted List and another sorted List.
Any other suggestions? I'm also free to use whatever technology I like (preferably Java). Guava is in the classpath, but I already thought about using Actors from Akka.
Edit: An additional related question would be if it's faster to use InsertionSort in a pipeline manner (to process the generated XPath results right when they are received) or to wait until the whole result has been generated and use QuickSort or MergeSort. I think InsertionSort should be preferable regardless of the resulting number of elements.
In general I hope sorting and then computing the intersection of two lists is faster than O(n^2) for the search of each item in the XPath result list, even if the list is divided by the number of CPU processors available.
Edit:
I've currently implemented the first part:
final ExecutorService executor = Executors.newFixedThreadPool(2);
final AbsTemporalAxis axis =
new NextRevisionAxis.Builder(mSession).setRevision(mRevision)
.setIncludeSelf(EIncludeSelf.YES).build();
for (final IReadTransaction rtx : axis) {
final ListenableFuture<Void> future =
Futures.makeListenable(executor.submit(new XPathEvaluation(rtx, mQuery)));
future.addListener(new Runnable() {
#Override
public void run() {
try {
mSemaphore.acquire();
} catch (final InterruptedException e) {
LOGWRAPPER.error(e.getMessage(), e);
}
}
}, executor);
}
executor.shutdown();
final ExecutorService sameThreadExecutor = MoreExecutors.sameThreadExecutor();
sameThreadExecutor.submit(new XPathResult());
sameThreadExecutor.shutdown();
return null;
The semaphore is initialized to 2 and in XPathEvaluation the resulting nodeKeys are added to a LinkedBlockingQueue.
Then I'm going to sort the XPathResults denoted with the comment, which isn't implemented yet:
private final class XPathResult implements Callable<Void> {
#Override
public Void call() throws AbsTTException, InterruptedException {
while (true) {
final long key = mQueue.take();
if (key == -1L) {
break;
}
if (mSemaphore.availablePermits() == 0) {
mQueue.put(-1L);
}
// Do InsertionSort.
}
return null;
}
}
Without any JavaDoc, but I think at least it should work, what do you think? Do you have any preferable solutions or do I have made some mistakes so far?
kind regards,
Johannes
Are you sure you need to do this concurrently? Can't you just build the two lists consecutively and after that perform your sorting/intersecting? - That would take a lot of complexity from the subject.
I assume that intersecting cannot be done until both lists are filled completely, am I correct? Then, no queue or synchronization would be needed, just fill two lists/sets and, once done, process both full lists.
But maybe I'm not quite getting your point...
Related
I have a scenario where I have to maintain a Map which can be populated by multiple threads, each modifying their respective List (unique identifier/key being the thread name), and when the list size for a thread exceeds a fixed batch size, we have to persist the records to the database.
Aggregator class
private volatile ConcurrentHashMap<String, List<T>> instrumentMap = new ConcurrentHashMap<String, List<T>>();
private ReentrantLock lock ;
public void addAll(List<T> entityList, String threadName) {
try {
lock.lock();
List<T> instrumentList = instrumentMap.get(threadName);
if(instrumentList == null) {
instrumentList = new ArrayList<T>(batchSize);
instrumentMap.put(threadName, instrumentList);
}
if(instrumentList.size() >= batchSize -1){
instrumentList.addAll(entityList);
recordSaver.persist(instrumentList);
instrumentList.clear();
} else {
instrumentList.addAll(entityList);
}
} finally {
lock.unlock();
}
}
There is one more separate thread running after every 2 minutes (using the same lock) to persist all the records in Map (to make sure we have something persisted after every 2 minutes and the map size does not gets too big)
if(//Some condition) {
Thread.sleep(//2 minutes);
aggregator.getLock().lock();
List<T> instrumentList = instrumentMap.values().stream().flatMap(x->x.stream()).collect(Collectors.toList());
if(instrumentList.size() > 0) {
saver.persist(instrumentList);
instrumentMap .values().parallelStream().forEach(x -> x.clear());
aggregator.getLock().unlock();
}
}
This solution is working fine in almost for every scenario that we tested, except sometimes we see some of the records went missing, i.e. they are not persisted at all, although they were added fine to the Map.
My questions are:
What is the problem with this code?
Is ConcurrentHashMap not the best solution here?
Does the List that is used with the ConcurrentHashMap have an issue?
Should I use the compute method of ConcurrentHashMap here (no need I think, as ReentrantLock is already doing the same job)?
The answer provided by #Slaw in the comments did the trick. We were letting the instrumentList instance escape in non-synchronized way i.e. access/operations are happening over list without any synchonization. Fixing the same by passing the copy to further methods did the trick.
Following line of code is the one where this issue was happening
recordSaver.persist(instrumentList);
instrumentList.clear();
Here we are allowing the instrumentList instance to escape in non-synchronized way i.e. it is passed to another class (recordSaver.persist) where it was to be actioned on but we are also clearing the list in very next line(in Aggregator class) and all of this is happening in non-synchronized way. List state can't be predicted in record saver... a really stupid mistake.
We fixed the issue by passing a cloned copy of instrumentList to recordSaver.persist(...) method. In this way instrumentList.clear() has no affect on list available in recordSaver for further operations.
I see, that you are using ConcurrentHashMap's parallelStream within a lock. I am not knowledgeable about Java 8+ stream support, but quick searching shows, that
ConcurrentHashMap is a complex data structure, that used to have concurrency bugs in past
Parallel streams must abide to complex and poorly documented usage restrictions
You are modifying your data within a parallel stream
Based on that information (and my gut-driven concurrency bugs detector™), I wager a guess, that removing the call to parallelStream might improve robustness of your code. In addition, as mentioned by #Slaw, you should use ordinary HashMap in place of ConcurrentHashMap if all instrumentMap usage is already guarded by lock.
Of course, since you don't post the code of recordSaver, it is possible, that it too has bugs (and not necessarily concurrency-related ones). In particular, you should make sure, that the code that reads records from persistent storage — the one, that you are using to detect loss of records — is safe, correct, and properly synchronized with rest of your system (preferably by using a robust, industry-standard SQL database).
It looks like this was an attempt at optimization where it was not needed. In that case, less is more and simpler is better. In the code below, only two concepts for concurrency are used: synchronized to ensure a shared list is properly updated and final to ensure all threads see the same value.
import java.util.ArrayList;
import java.util.List;
public class Aggregator<T> implements Runnable {
private final List<T> instruments = new ArrayList<>();
private final RecordSaver recordSaver;
private final int batchSize;
public Aggregator(RecordSaver recordSaver, int batchSize) {
super();
this.recordSaver = recordSaver;
this.batchSize = batchSize;
}
public synchronized void addAll(List<T> moreInstruments) {
instruments.addAll(moreInstruments);
if (instruments.size() >= batchSize) {
storeInstruments();
}
}
public synchronized void storeInstruments() {
if (instruments.size() > 0) {
// in case recordSaver works async
// recordSaver.persist(new ArrayList<T>(instruments));
// else just:
recordSaver.persist(instruments);
instruments.clear();
}
}
#Override
public void run() {
while (true) {
try { Thread.sleep(1L); } catch (Exception ignored) {
break;
}
storeInstruments();
}
}
class RecordSaver {
void persist(List<?> l) {}
}
}
I'm currently trying to implement a system list that would run in a few different threads:
1) First thread is listening to incoming requests and adds them to the list.
2) A new thread is created for each request to perform certain operations.
3) Another thread iterates through the list, checks the status of each request, and removes them from the list when they're complete.
Now, the way I have it in a very simplified pseudocode can be viewed below:
private List<Job> runningJobs = new ArrayList<>(); // our list of requests
private Thread monitorThread;
private Runnable monitor = new Runnable() { // this runnable is later called in a new thread to monitor the list and remove completed requests
#Override
public void run() {
boolean monitorRun = true;
while(monitorRun) {
try {
Thread.sleep(1000);
if (runningJobs.size()>0){
Iterator<Job> i = runningJobs.iterator();
while (i.hasNext()) {
try {
Job job = i.next();
if (job.jobStatus() == 1) { // if job is complete
i.remove();
}
}
catch (java.util.ConcurrentModificationException e){
e.printStackTrace();
}
}
}
if (Thread.currentThread().isInterrupted()){
monitorRun = false;
}
} catch (InterruptedException e) {
monitorRun = false;
}
}
}
};
private void addRequest(Job job){
this.runningJobs.add(newJob);
// etc
}
In short, the Runnable monitor is what runs continuously in the third thread; the first thread is calling addRequest() occasionally.
While my current implementation somewhat works, I'm concerned about the order of operations here and possible java.util.ConcurrentModificationException (and the system is anything but robust). I'm certain there is a much better way to organize this mess.
What's the proper or a better way to do this?
Your requirements would be met nicely with an ExecutorService. For each request, create Job, and submit it to the service. Internally, the service uses a BlockingQueue, which would address your question directly, but you don't have to worry about it with an ExecutorService.
Specifically, something like this:
/* At startup... */
ExecutorService workers = Executors.newCachedThreadPool();
/* For each request... */
Job job = ... ;
workers.submit(job); /* Assuming Job implements Runnable */
// workers.submit(job::jobEntryPoint); /* If Job has some other API */
/* At shutdown... */
workers.shutdown();
There are a few different ways.
You can synchronize the list. This is possibly the most brute-force and still wouldn't help prevent an insert while you are iterating over it.
There are a few synchronized* collections. These tend to be better but have ramifications. For instance CopyOnWriteArrayList will work but it creates a new array list each time (that you would assign back to the variable). This is good for occasionally updated collections.
There is a ConcurrentLinkedQueue--Since it's "Linked" you can't reference an item in the middle.
Look through the implementations of the "List" interface and pick the one that best suits your problem.
If your problem is a queue instead of a list, there are a few implementations of that as well and they will tend to be better suited for that type of problem.
In general my answer is that you should probably scan through the Javadocs every time java does a major release and examine (at least) the new collections. You might be surprised at the stuff that's in there.
I have the following method:
void store(SomeObject o) {
}
The idea of this method is to store o to a permanent storage but the function should not block. I.e. I can not/must not do the actual storage in the same thread that called store.
I can not also start a thread and store the object from the other thread because store might be called a "huge" amount of times and I don't want to start spawning threads.
So I options which I don't see how they can work well:
1) Use a thread pool (Executor family)
2) In store store the object in an array list and return. When the array list reaches e.g. 1000 (random number) then start another thread to "flush" the array list to storage. But I would still possibly have the problem of too many threads (thread pool?)
So in both cases the only requirement I have is that I store persistantly the objects in exactly the same order that was passed to store. And using multiple threads mixes things up.
How can this be solved?
How can I ensure:
1) Non blocking store
2) Accurate insertion order
3) I don't care about any storage guarantees. If e.g. something crashes I don't care about losing data e.g. cached in the array list before storing them.
I would use a SingleThreadExecutor and a BlockingQueue.
SingleThreadExecutor as the name sais has one single Thread. Use it to poll from the Queue and persist objects, blocking if empty.
You can add not blocking to the queue in your store method.
EDIT
Actually, you do not even need that extra Queue - JavaDoc of newSingleThreadExecutor sais:
Creates an Executor that uses a single worker thread operating off an unbounded queue. (Note however that if this single thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks.) Tasks are guaranteed to execute sequentially, and no more than one task will be active at any given time. Unlike the otherwise equivalent newFixedThreadPool(1) the returned executor is guaranteed not to be reconfigurable to use additional threads.
So I think it's exactly what you need.
private final ExecutorService persistor = Executors.newSingleThreadExecutor();
public void store( final SomeObject o ){
persistor.submit( new Runnable(){
#Override public void run(){
// your persist-code here.
}
} );
}
The advantage of using a Runnable that has a quasi-endless-loop and using an extra queue would be the possibility to code some "Burst"-functionality. For example you could make it wait to persist only when 10 elements are in queue or the oldest element has been added at least 1 minute ago ...
I suggest using a Chronicle-Queue which is a library I designed.
It allows you to write in the current thread without blocking. It was originally designed for low latency trading systems. For small messages it takes around 300 ns to write a message.
You don't need to use a back ground thread, or a on heap queue and it doesn't wait for the data to be written to disk by default. It also ensures consistent order for all readers. If the program dies at any point after you call finish() the message is not lost. (Unless the OS crashes/loses power) It also supports replication to avoid data loss.
Have one separate thread that gets items from the end of a queue (blocking on an empty queue), and writes them to disk. Your main thread's store() function just adds items to the beginning of the queue.
Here's a rough idea (though I assume there will be cleaner or faster ways for doing this in production code, depending on how fast you need things to be):
import java.util.*;
import java.io.*;
import java.util.concurrent.*;
class ObjectWriter implements Runnable {
private final Object END = new Object();
BlockingQueue<Object> queue = new LinkedBlockingQueue();
public void store(Object o) throws InterruptedException {
queue.put(o);
}
public ObjectWriter() {
new Thread(this).start();
}
public void close() throws InterruptedException {
queue.put(END);
}
public void run() {
while (true) {
try {
Object o = queue.take();
if (o == END) {
// close output file.
return;
}
System.out.println(o.toString()); // serialize as appropriate
} catch (InterruptedException e) {
}
}
}
}
public class Test {
public static void main(String[] args) throws Exception {
ObjectWriter w = new ObjectWriter();
w.store("hello");
w.store("world");
w.close();
}
}
The comments in your question make it sound like you are unfamilier with multi-threading, but it's really not that difficult.
You simply need another thread responsible for writing to the storage which picks items off a queue. - your store function just adds the objects to the in-memory queue and continues on it's way.
Some psuedo-ish code:
final List<SomeObject> queue = new List<SomeObject>();
void store(SomeObject o) {
// add it to the queue - note that modifying o after this will also alter the
// instance in the queue
synchronized(queue) {
queue.add(queue);
queue.notify(); // tell the storage thread there's something in the queue
}
}
void storageThread() {
SomeObject item;
while (notfinished) {
synchronized(queue) {
if (queue.length > 0) {
item = queue.get(0); // get from start to ensure same order
queue.removeAt(0);
} else {
// wait for something
queue.wait();
continue;
}
}
writeToStorage(item);
}
}
What's a good way of allowing searches from multiple threads on a list (or other data structure), but preventing searches on the list and edits to the list on different threads from interleaving? I tried using synchronized blocks in the searching and editing methods, but that can cause unnecessary blocking when trying to run searches in multiple threads.
EDIT: The ReadWriteLock is exactly what I was looking for! Thanks.
Usually, yes ReadWriteLock is good enough.
But, if you're using Java 8 you can get a performance boost with the new StampedLock that lets you avoid the read lock. This applies when you have much more frequent reads(searches) compared with writes(edits).
private StampedLock sl = new StampedLock();
public void edit() { // write method
long stamp = sl.writeLock();
try {
doEdit();
} finally {
sl.unlockWrite(stamp);
}
}
public Object search() { // read method
long stamp = sl.tryOptimisticRead();
Object result = doSearch(); //first try without lock, search ideally should be fast
if (!sl.validate(stamp)) { //if something has modified
stamp = sl.readLock(); //acquire read lock and search again
try {
result = doSearch();
} finally {
sl.unlockRead(stamp);
}
}
return result;
}
The following class acts as a simple cache that gets updated very infrequently (say e.g. twice a day) and gets read quite a lot (up to several times a second). There are two different types, a List and a Map. My question is about the new assignment after the data gets updated in the update method. What's the best (safest) way for the new data to get applied?
I should add that it isn't necessary for readers to see the absolute latest value. The requirements are just to get either the old or the new value at any given time.
public class Foo {
private ThreadPoolExecutor _executor;
private List<Object> _listObjects = new ArrayList<Object>(0);
private Map<Integer, Object> _mapObjects = new HashMap<Integer, Object>();
private Object _mutex = new Object();
private boolean _updateInProgress;
public void update() {
synchronized (_mutex) {
if (_updateInProgress) {
return;
} else {
_updateInProgress = true;
}
}
_executor.execute(new Runnable() {
#Override
public void run() {
try {
List<Object> newObjects = loadListObjectsFromDatabase();
Map<Integer, Object> newMapObjects = loadMapObjectsFromDatabase();
/*
* this is the interesting part
*/
_listObjects = newObjects;
_mapObjects = newMapObjects;
} catch (final Exception ex) {
// error handling
} finally {
synchronized (_mutex) {
_updateInProgress = false;
}
}
}
});
}
public Object getObjectById(Integer id) {
return _mapObjects.get(id);
}
public List<Object> getListObjects() {
return new ArrayList<Object>(_listObjects);
}
}
As you see, currently no ConcurrentHashMap or CopyOnWriteArrayList is used. The only synchronisation is done in the update method.
Although not necessary for my current problem, it would be also great to know the best solution for cases where it is essential for readers to always get the absolute latest value.
You could use plan synchronization unless you are reading over 10,000 times per second.
If you want concurrent access I would use on of the concurrent collections like ConcurrentHashMap or CopyOnWriteArrayList. These are simpler to use than synchronizing the collection. (i.e. you don't need them for performance reasons, use them for simplicity)
BTW: A modern CPU can perform billions of operations in 0.1 seconds so several times a second is an eternity to a computer.
I am also seeing this issue and think of multiple solutions:
Use synchronization block on the both codes, one where reading and other where writing.
Make a separate remove list, add all removable items in that list. Remove in the same thread where reading the list just after reading is done. This way reading and deleting will happen in sequence and no error will come.