Thread safe graph libraries

Thread safe graph libraries - java

I am looking for a good Java Graph Library which is thread safe for concurrent access.
JGraphT, JUNG, JTS are very good but again for concurrent access I will have to synchronize it externally which is becoming a pain.
It is a pain because say If thread A have to access 50 vertices, Thread B for another 50 with the intersection of vertices being 20 vertices. Now while writing code I need to know this 20 before so that I can synchronize it accordingly.
Pl suggest

Have you considered Neo4J
Here is a snippet describing their product.
Neo4j is a high-performance, NOSQL graph database with all the features of a mature and robust database.

I'm afraid what you're looking for is impossible, because thread-safety is a property of algorithms, not a property of data structures. Here's an example:
Let's say your graph library has a main Graph class with a number of methods, all of which are synchronized. For example, addVertex(), removeVertex(), addEdge(), removeEdge(), etc. Let's also say that the Vertex class has some useful methods like getAdjacentEdges(), for example, also synchronized on the containing Graph instance.
Now clearly because everything is synchronized, it's impossible to corrupt the data structure. For example, you'll never have a situation where v.getAdjacentEdges() gives you an edge that's not actually in the graph containing vertex v. The graph structure is always internally consistent thanks to its internal synchronization.
However, your algorithms operating on the graph can still easily break. For example, let's say you write:
for (Edge e : v.getAdjacentEdges()) {
g.removeEdge(e);
}
The call to getAdjacentEdges() is atomic, as is each call to removeEdge() in the loop, but the algorithm itself is not. Another thread may add a new edge adjacent to v while this loop is running, or remove an edge, or whatever. To be safe, you still need a way of ensuring that the loop as a whole is atomic, and the graph itself cannot provide that.
My best advice, I think, is to use JGraphT in combination with Akka's STM implementation (or similar), so that you can write your algorithms without needing to determine ahead of time which objects will require locking. If you're not familiar with STM and its performance characteristics, Wikipedia's article on the topic does a decent job of explaining.

JGraphT now provides a concurrent AsSynchronizedGraph graph implementation which is thread safe. In addition, JGraphT has a number of algorithmic implementations that take advantage of multiple threads. See for instance the DeltaSteppingShortestPath algorithm.

How about letting several threads do whatever they can do and then submit there solution to one master controller that collects results and comes up with the best solution.

The simplest solution is to create one big monitor.
public Object theBigGraphMonitor = new Object();
Before doing ANY operation on the graph, synchronize on that single monitor.
Fiddling with indivudial verticles seems to be hard to get right... To say the least.

If you only want to change nodes locally, you can maintain an individual lock for each node. The simplest way to do this would be to implement a custom node class with synchronized methods (you could use ReentrantLock as well) i.e. something like this:
public class SynchronizedNode extends Node {
public synchronized void localOp1() { ... }
public synchronized void localOp2() { ... }
}
or
public class SynchronizedNode extends Node {
ReentrantLock lock ....;
public synchronized void localOp1() { lock.lock() try { ... } finally { lock.unlock } }
public synchronized void localOp2() { lock.lock() try { ... } finally { lock.unlock } }
}

Have a look at charts4j API. We are using it in our application with reasonable no of concurrent users and there has been no problems yet. I am not sure if the API is thread safe or not.
One problem we have noticed is that the url of the graph generated will point to http://www.google.com/... which can be a problem if you are working inside a VPN and the internet is not available.(May be there is a way out it of it).

Related

why external synchronization is faster than internal one?

In Collection framework, why is external synchronization is faster than internal one(Vector, HashTable etc)? Even though they both use same mechanism?
What exactly meaning of internal and external synchronizations and how do they differ from each other?
It is really helpful if someone can explain with examples.

What exactly meaning of internal and external synchronizations and how do they differ from each other?
External synchronization is when the caller (you) use the synchronized keyword or other locks to protect against another class being accessed by multiple threads. It is usually used if the class in question is not synchronized itself -- SimpleDateFormat is a prime example. It can also be used if you need signaling between threads -- even when dealing with a concurrent collection.
why is external synchronization is faster than internal one(Vector, HashTable etc)? Even though they both use same mechanism?
External synchronization is not necessarily faster. Typically a class can determine precisely when it needs to synchronize around a critical section of code instead of the caller wrapping all method calls in a synchronized block.
If you are talking about the general recommendation to not use Vector and HashTable and instead use the Collections.synchronizedList(...) or synchronizedMap(...) methods, then this is because Vector and HashTable are seen as old/old-of-date classes. A wrapped ArrayList or HashMap is seen as a better solution.
Sometimes as #Chris pointed out, external synchronization can be faster when you need to make a number of changes to a class one after another. By locking externally once and then performing multiple changes to the class, this works better than each change being locked internally. A single lock being faster than multiple lock calls are made in a row.
It is really helpful if someone can explain with examples.
Instead of Vector, people typically recommend a wrapped ArrayList as having better performance. This wraps the non-synchronized ArrayList class in a wrapper class which external synchronizes it.
List<Foo> list = Collections.synchronizedList(new ArrayList<Foo>());
In terms of internal versus external in general, consider the following class that you want to allow multiple threads to use it concurrently:
public class Foo {
private int count;
public void addToCount() {
count++;
log.info("count increased to " + count);
}
}
You could use external synchronization and wrap every call to addToCount() in a synchronized block:
synchronized (foo) {
foo.addToCount();
}
Or the class itself can use internal synchronization and do the locking for you. This performs better because the logger class does not have to be a part of the lock:
public void addToCount() {
int val;
synchronized (this) {
val = ++count;
}
// this log call should not be synchronized since it does IO
log.info("count increased to " + val);
}
Of course, the Foo class really should use an AtomicInteger in this case and take care of its own reentrance internally:
private final AtomicInteger count = new AtomicInteger(0);
public void addToCount() {
int val = count.incrementAndGet()
log.info("count increased to " + val);
}

Let's say you work in a bank. Every time you need to use the safe, it needs to be unlocked, and then re-locked when you're done using it.
Now let's say that you need to carry 50 boxes into the safe. You have two options:
Carry each box over individually, opening and closing the (extremely heavy) door each time
Lock the front door to the bank and leave the vault open, make 50 trips without touching the internal vault door
Which one is faster? (The first option is internal synchronization, the second option is external synchronization.)

How to handle synchronization of frequent concurrent read/writes on a Java ArrayList

I have a Java class that contains an ArrayList of transaction info objects that get queried and modified by different threads on a frequent basis. At a basic level, the structure of the class looks something like this (currently no synchronization is present):
class Statistics
{
private List<TranInfo> tranInfoList = new ArrayList<TranInfo>();
// This method runs frequently - every time a transaction comes in.
void add(TranInfo tranInfo)
{
tranInfoList.add(tranInfo);
}
// This method acts like a cleaner and runs occasionally.
void removeBasedOnSomeCondition()
{
// Some code to determine which items to remove
tranInfoList.removeAll(listOfUnwantedTranInfos);
}
// Methods to query stats on the tran info.
// These methods are called frequently.
Stats getStatsBasedOnSomeCondition()
{
// Iterate over the list of tran info
// objects and return some stats
}
Stats getStatsBasedOnSomeOtherCondition()
{
// Iterate over the list of tran info
// objects and return some stats
}
}
I need to ensure that read/write operations on the list are synchronized correctly, however, performance is very important, so I don't want to end up locking in every method call (especially for concurrent read operations). I've looked at the following solutions:
CopyOnWriteArrayList
I've looked at the use of a CopyOnWriteArrayList to prevent ConcurrentModificationExceptions being thrown when the list is modified while iterating over it; the problem here is the copy required each time the list is modified... it seems too expensive given how often the list will be modified and the potential size of the list.
ReadWriteLock
A ReadWriteLock could be used to synchronize read/write operations while allowing concurrent read operations to take place. While this approach will work, it ends up resulting in a lot of synchronization code in the class (this isn't the end of the world though).
Are there any other clever ways of achieving this kind of synchronization without a big performance penalty, or are one of the above methods the recommended way? Any advice on this would be greatly appreciated.

I'd use Collections.synchronizedList() until you know for sure that it is indeed the crucial performance bottle neck of your application (needless to say I doubt it is ;-)). You can only know for sure through thorough testing. I assume you know about "premature optimization"...
If then you strive to optimize access to that list I'd say ReadWriteLock is a good approach.

Another solution that may make sense (especially under heavy read/write) is ConcurrentLinkedQueue (http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html). It is a pretty scalable implementation under contention, based on CAS operations.
The one change that's required to your code is that ConcurrentLinkedQueue does not implement the List interface, and you need to abide by either the Iterable or the Queue type. The only operation you lose really is random access via index, but I don't see that being an issue in your access pattern.

Two threads acessing same LinkedList

I am new to Java, and have come across a problem when trying to implement a simple game.
The premise of the game currently is, a timer is used to add a car, and also more frequently to update the movement of the car. A car can be selected by touch, and directed by drawing it's path. The update function will move the car along the path.
Now, the game crashes with an IndexOutOfBoundsException, and I am almost certain this is because occasionally, when a car is reselected, the current path is wiped and it allows a new path to be drawn. The path is stored as a LinkedList, and cleared when the car is touched.
I imagine if the path is cleared via a touch event, whilst the timer thread is updating the cars movement along the path, this is where the error occurs (There are also similar other issues that could arise with two threads accessing this one list.
My question, in Java, what would be the best way of dealing with this? Are there specific types of lists I should be using rather than LinkedList, or are there objects such as a Mutex in c++, where I can protect this list whilst working with it?

In Java, this is usually accomplished using synchronization
A small example might look something like this:
LinkedList list = //Get/build your list
public void doStuffToList()
{
synchronized(list)
{
//Do things to the list
}
}
public void clearList()
{
synchronized(list)
{
list.clear();
}
}
This code won't let the clear operation be performed if there's another thread currently operating on the list at that time. Note that this will cause blocking, so be careful for deadlocks.
Alternatively, if your List is a class that you've built yourself, it probably makes sense to make the data structure thread safe itself:
public class SynchroLinkedList
{
//Implementation details
public synchronized void doThingsToList()
{
//Implementation
}
public synchronized void clearList()
{
//Implementation
}
}
These two approaches would effectively work the same way, but with the second one your thread safety is abstracted into the datatype, which is nice because you don't have to worry about thread safety all over the place when you're using the list.

Instead of recreating your own thread safe list implementation, you have several built-in options, essentially:
use a synchronized list:
List list = Collections.synchronizedList(new LinkedList());
Note that you need to synchronize on the list (synchronized(list) { }) for iterations and other combined operations that need to be atomic)
use a thread safe collection, for example a CopyOnWriteArrayList or a ConcurrenLinkedQueue, which could be a good alternative if you don't need to access items in the middle of the list, but only need to add an iterate.
Note that a CopyOnWriteArrayList might have a performance penalty depending on your use case, especially if you regularly add items (i.e. every few microseconds) and the list can become big.

Determining synchronization scope?

in trying to improve my understanding on concurrency issues, I am looking at the following scenario (Edit: I've changed the example from List to Runtime, which is closer to what I am trying):
public class Example {
private final Object lock = new Object();
private final Runtime runtime = Runtime.getRuntime();
public void add(Object o) {
synchronized (lock) { runtime.exec(program + " -add "+o); }
}
public Object[] getAll() {
synchronized (lock) { return runtime.exec(program + " -list "); }
}
public void remove(Object o) {
synchronized (lock) { runtime.exec(program + " -remove "+o); }
}
}
As it stands, each method is by thread safe when used standalone. Now, what I'm trying to figure out is how to handle where the calling class wishes to call:
for (Object o : example.getAll()) {
// problems if multiple threads perform this operation concurrently
example.remove(b);
}
But as noted, there is no guarantee that the state will be consistent between the call to getAll() and the calls to remove(). If multiple threads call this, I'll be in trouble. So my question is - How should I enable the developer to perform the operation in a thread safe manner? Ideally I wish to enforce the thread safety in a way that makes it difficult for the developer to avoid/miss, but at the same time not complicated to achieve. I can think of three options so far:
A: Make the lock 'this', so the synchronization object is accessible to calling code, which can then wrap the code blocks. Drawback: Hard to enforce at compile time:
synchronized (example) {
for (Object o : example.getAll()) {
example.remove(b);
}
}
B: Place the combined code into the Example class - and benefit from being able to optimize the implementation, as in this case. Drawback: Pain to add extensions, and potential mixing unrelated logic:
public class Example {
...
public void removeAll() {
synchronized (lock) { Runtime.exec(program + " -clear"); }
}
}
C: Provide a Closure class. Drawback: Excess code, potentially too generous of a synchronization block, could in fact make deadlocks easier:
public interface ExampleClosure {
public void execute(Example example);
}
public Class Example {
...
public void execute(ExampleClosure closure) {
synchronized (this) { closure.execute(this); }
}
}
example.execute(new ExampleClosure() {
public void execute(Example example) {
for (Object o : example.getAll()) {
example.remove(b);
}
}
}
);
Is there something I'm missing? How should synchronization be scoped to ensure the code is thread safe?

Use a ReentrantReadWriteLock which is exposed via the API. That way, if someone needs to synchronize several API calls, they can acquire a lock outside of the method calls.

In general, this is a classic multithreaded design issue. By synchronizing the data structure rather than synchronizing concepts that use the data structure, it's hard to avoid the fact that you essentially have a reference to the data structure without a lock.
I would recommend that locks not be done so close to the data structure. But it's a popular option.
A potential technique to make this style work is to use an editing tree-walker. Essentially, you expose a function that does a callback on each element.
// pointer to function:
// - takes Object by reference and can be safely altered
// - if returns true, Object will be removed from list
typedef bool (*callback_function)(Object *o);
public void editAll(callback_function func) {
synchronized (lock) {
for each element o { if (callback_function(o)) {remove o} } }
}
So then your loop becomes:
bool my_function(Object *o) {
...
if (some condition) return true;
}
...
editAll(my_function);
...
The company I work for (corensic) has test cases extracted from real bugs to verify that Jinx is finding the concurrency errors properly. This type of low level data structure locking without higher level synchronization is pretty common pattern. The tree editing callback seems to be a popular fix for this race condition.

I think everyone is missing his real problem. When iterating over the new array of Object's and trying to remove one at a time the problem is still technically unsafe (though ArrayList implantation would not explode, it just wouldnt have expected results).
Even with CopyOnWriteArrayList there is the possibility that there is an out of date read on the current list to when you are trying to remove.
The two suggestions you offered are fine (A and B). My general suggestion is B. Making a collection thread-safe is very difficult. A good way to do it is to give the client as little functionality as possible (within reason). So offering the removeAll method and removing the getAll method would suffice.
Now you can at the same time say, 'well I want to keep the API the way it is and let the client worry about additional thread-safety'. If thats the case, document thread-safety. Document the fact that a 'lookup and modify' action is both non atomic and non thread-safe.
Today's concurrent list implementations are all thread safe for the single functions that are offered (get, remove add are all thread safe). Compound functions are not though and the best that could be done is documenting how to make them thread safe.

I think j.u.c.CopyOnWriteArrayList is a good example of similar problem you're trying to solve.
JDK had a similar problem with Lists - there were various ways to synchronize on arbitrary methods, but no synchronization on multiple invocations (and that's understandable).
So CopyOnWriteArrayList actually implements the same interface but has a very special contract, and whoever calls it, is aware of it.
Similar with your solution - you should probably implement List (or whatever interface this is) and at the same time define special contracts for existing/new methods. For example, getAll's consistency is not guaranteed, and calls to .remove do not fail if o is null, or isn't inside the list, etc. If users want both combined and safe/consistent options - this class of yours would provide a special method that does exactly that (e.g. safeDeleteAll), leaving other methods close to original contract as possible.
So to answer your question - I would pick option B, but would also implement interface your original object is implementing.

From the Javadoc for List.toArray():
The returned array will be "safe" in
that no references to it are
maintained by this list. (In other
words, this method must allocate a new
array even if this list is backed by
an array). The caller is thus free to
modify the returned array.
Maybe I don't understand what you're trying to accomplish. Do you want the Object[] array to always be in-sync with the current state of the List? In order to achieve that, I think you would have to synchronize on the Example instance itself and hold the lock until your thread is done with its method call AND any Object[] array it is currently using. Otherwise, how will you ever know if the original List has been modified by another thread?

You have to use the appropriate granularity when you choose what to lock. What you're complaining about in your example is too low a level of granularity, where the lock doesn't cover all the methods that have to happen together. You need to make methods that combine all the actions that need to happen together within the same lock.
Locks are reentrant so the high-level method can call low-level synchronized methods without a problem.

Thread safety in java

All,
I started learning Java threads in the past few days and have only read about scenarios where even after using synchronizer methods/blocks, the code/class remains vulnerable to concurrency issues. Can anyone please provide a scenario where synchronized blocks/methods fail ? And, what should be the alternative in these cases to ensure thread safety.

Proper behaviour under concurrent access is a complex topic, and it's not as simple as just slapping synchronized on everything, as now you have to think about how operations might interleave.
For instance, imagine you have a class like a list, and you want to make it threadsafe. So you make all the methods synchronized and continue. Chances are, clients might be using your list in the following way:
int index = ...; // this gets set somewhere, maybe passed in as an argument
// Check that the list has enough elements for this call to make sense
if (list.size() > index)
{
return list.get(index);
}
else
{
return DEFAULT_VALUE;
}
In a single-threaded environment this code is perfectly safe. However, if the list is being accessed (and possibly modified) concurrently, it's possible for the list's size to change after the call to size(), but before the call to get(). So the list could "impossibly" throw an IndexOutOfBoundsException (or similar) in this case, even though the size was checked beforehand.
There's no shortcut of how to fix this - you simply need to think carefully about the use-cases for your class/interface, and ensure that you can actually guarantee them when interleaved with any other valid operations. Often this might require some additional complexity, or simply more specifics in the documentation. If the hypothetical list class specified that it always synchronized on its own monitor, than that specific situation could be fixed as
synchronized(list)
{
if (list.size() > index)
{
return list.get(index);
}
}
but under other synchronization schemes, this would not work. Or it might be too much of a bottleneck. Or forcing the clients to make the multiple calls within the same lexical scope may be an unacceptable constraint. It all depends on what you're trying to achieve, as to how you can make your interface safe, performant and elegant.

Scenario 1 Classic deadlock:
Object Mutex1;
Object Mutex2;
public void method1(){
synchronized(Mutex1){
synchronized(Mutex2){
}
}
}
public void method2(){
synchronized(Mutex2){
synchronized(Mutex1){
}
}
}
Other scenarios include anything with a shared resource even a variable, because one thread could change the variables contents, or even make it point to null without the other thread knowing. Writing to IO has similar issues try writing code to a file using two threads or out to a sockeet.

Very good articles about concurrency and the Java Memory Model can be found at Angelika Langers website

"vulnerable to concurrency issues" is very vague. It would help to know what you have actually read and where. Two things that come to mind:
Just slapping on "synchronized" somewhere does not mean the code is synchronized correctly - it can be very hard to do correctly, and developers frequently miss some problematic scenarios even when they think they're doing it right.
Even if the synchronization correctly prevents non-deterministic changes to the data, you can still run into deadlocks.

Synchronized methods prevent other methods/blocks requiring same monitor from being executed when you execute them.
But if you have 2 methods, lets say int get() and set(int val) and have somewhere else method which does
obj.set(1+obj.get());
and this method runs in two threads, you can end with value increased by one or by two, depending on unpredictable factors.
Therefore you must somehow protect using such methods too (but only if its needed).
btw. use each monitor for as few functions/blocks as possible, so only those who can wrongly influence each other are synchronized.
And try to expose as few as possible methods requiring further protection.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Thread safe graph libraries - java

Have you considered Neo4J Here is a snippet describing their product. Neo4j is a high-performance, NOSQL graph database with all the features of a mature and robust database.

JGraphT now provides a concurrent AsSynchronizedGraph graph implementation which is thread safe. In addition, JGraphT has a number of algorithmic implementations that take advantage of multiple threads. See for instance the DeltaSteppingShortestPath algorithm.

How about letting several threads do whatever they can do and then submit there solution to one master controller that collects results and comes up with the best solution.

The simplest solution is to create one big monitor. public Object theBigGraphMonitor = new Object(); Before doing ANY operation on the graph, synchronize on that single monitor. Fiddling with indivudial verticles seems to be hard to get right... To say the least.

Related

why external synchronization is faster than internal one?

How to handle synchronization of frequent concurrent read/writes on a Java ArrayList

Two threads acessing same LinkedList

Determining synchronization scope?

Thread safety in java

Categories

Resources