Sharing large data between threads without synchronizing access

Sharing large data between threads without synchronizing access - java

For example, consider a spell-checker that works on a large document while the user is typing new content into the document. To avoid delaying the user's work the spell-checking is done on a separate thread, but the spell-checker still needs access to the document. We could put all access to the document into synchronized blocks, but that would force the editing thread to slow down to accommodate the spell-checking thread.
In principle the user-interaction thread should never be blocked for any reason. If there are to be any delays suffered due to accessing the document from multiple threads, we want them to always happen in the spell-checking thread, but what sort of thread communication will guarantee that?
This same problem applies to more than just spell-checking. Whenever one thread is the primary owner of some data while another thread performs noncritical background computations on the data we face the same situation. I imagine that the solution is some sort of message-passing system where the secondary thread sends requests to the primary thread that are processed at the primary thread's own pace.

Just want to share some tricks I would use:
Use fair ReadWriteLock.
Make the lock fine-grained, to make no thread hold the lock for too long.
Move the time-consuming code outside the lock. It may require copy some task context.
Use Copy-On-Write in some case. For example, split your document to multiple small parts. Assume user is editing only one part. And the user is editing at normal human speed. Then you can make each part a Copy-On-Write collection. Even a cow collection of cow collections if you want to.

Related

How to convert ReentrantReadWriteLock logic to LMAX Disruptor with barriers

I have a shared collection an ArrayList and also i use a ReentrantReadWriteLock lock to secure the entering on a critical area from different threads. My threads are three a writer,read,delete thread. i acquire the correct lock on each case. The logic is that i insert data to ArrayList, read them when necessary, and also when the timer reaches limits delete some entries. The process runs smoothly and everything is perfect.
My question now is can i transfer the above logic somehow and implemented it with an LMAX disruptor in order to avoid lock overheads and improve performance. If yes can you describe me an ideal case and if you are able to also post code i would really appreciate it.
i assume that instead of ArrayList data will be entered in ringbuffer and i will have 2 producers write, delete, and a consumer for read. Also i must make sure that i use producer barriers. Will the performance will be increased from lock case. i am not sure if i understand everything correctly please help me and give me directions?

If your shared state is the ArrayList and you have one thread that is reading and processing the elements in the ArrayList and you want the updates to that shared state synchronised then usually the ArrayList would be owned by one EventHandler that processes events such as Process, Write, Delete and updates and processes the shared state.
This would all run on one thread but that pretty much is what is happening now as you cannot Read at the same time as Writing/Deleting.
As you only have one reading thread there is not much to be gained from using a ReadWriteLock as you will never have concurrent reads.

in multithreaded programming does synchronization strip the benefits of concurrent executions

I have a dilemma regarding the use of multithreading in the application I am working on. I have a workflow in which the state of the object changes, which presents no issues for single-threaded operation. However, In order to improve the performance, I am planning to use multiple threads.
It is my understanding that since the state is going to be shared among the threads, every thread must acquire a lock on the state before execution, so doesn't this defeat the purpose of multithreading? It seems like multiple threads won't produce any actual concurrency, so it wouldn't be any better than single threaded.
Is my analysis correct? If I am misunderstanding then would someone please clarify the concept?

The short answer: concurrency is hard. Real concurrency, with multiple concurrent writers, is really hard.
What you need to determine is what your actual consistency guarantees need to be. Does every reader need to be able to see every write, guaranteed? Then you'll be forced into linearizing all the threads somehow (e.g. using locks) -- your next effort should be to ensure you do as much work as possible outside of the lock, to keep the lock held for the shortest possible time.
One way to keep the lock held for the shortest possible time is to use a lock-free algorithm. Most lock-free algorithms are based on an atomic compare-and-set primitive, such as those provided by the java.util.concurrent.atomic package. These can be very high-performance, but designing a successful lock-free algorithm can be subtle. One simple kind of lock-free algorithm is to just build a new (immutable) state object and then atomically make it the "live" state, retrying in a loop if a different state was made live by another writer in the interim. (This approach is good enough for many applications, but it's vulnerable to livelock if you have too many writers.)
If you can get by with a looser consistency guarantee, then many other optimizations are possible. For example, you can use thread-local caches so that each thread sees its own view of the data and can be writing in parallel. Then you need to deal with the consequences of data being stale or inconsistent. Most techniques in this vein strive for eventual consistency: writes may not be visible to all readers immediately, but they are guaranteed to be visible to all readers eventually.
This is an active area of research, and a complete answer could fill a book (really, several books!). If you're just getting started in this area, I'd recommend you read Java Concurrency in Practice by Goetz et al, as it provides a good introduction to the subject and lots of practical advice about how to successfully build concurrent systems.

Your interpretation of the limits of multithreading and concurrency are correct. Since the state must be acquired and controlled by threads in order for them to perform work (and waiting when not working), you are essentially splitting the work of a single thread among multiple threads.
The best way to fix this is to adjust your program design to limit the size of the critical section. As we learned in my operating systems course with process synchronization,
only one critical section must be executing at any given time
The specific term critical section may not directly apply to Java concurrency, but it still illustrates the concept.
What does it mean to limit this critical section? For example, let's say you have a program managing a single bank account (unrealistic, but illustrates my point). If a lock on the account must be acquired by a thread for the balance to be updated, the basic option would be to have a single thread working on updating the balance at all times (without concurrency). The critical section would be the entire program. However, let's say there was also other logic to be executed, such as alerting other banks of the balance update. You could require the lock on the bank account state only while updating the balance, and not when alerting other banks, decreasing the size of critical section and allowing other threads to perform work by alerting other banks while one thread is updating the balance.
Please comment if this was unclear. Your seem to already understand the constraints of concurrency, but hopefully this will reveal possible steps towards implementing concurrency.

Your need is not totally clear but you guess well the limitations that multi threading may have.
Running parallel threads have a sense if some "relatively autonomous" tasks can be concurrently performed by distinct threads or group of threads.
If your scenario looks like : you start 5 threads and finally only a single thread is active while the others are waiting for a locking resource, using multithreading makes no sense and could even introduce an overhead because of cpu context switches.
I think that in your use case, the multithreading could be used for :
tasks that don't change the state
performing a task that changes the state if the task may be divided in multiple processing with a minimal set of instructions that may do profitable the multithreading use.

It is my understanding that since the state is going to be shared among the threads, every thread must acquire a lock on the state before execution, so doesn't this defeat the purpose of multithreading?
The short answer is "it depends". It is rare that you have a multithreaded application that has no shared data. So sharing data, even if it needs a full lock, doesn't necessarily defeat the performance improvements when making a single threaded application be multi-threaded.
The big question is what the frequency that the state needs to be updated by each thread. If the threads read in the state, do their concurrent processing which takes time, and then alter the state at the end then you may see performance gains. On the other hand, if every step in the processing needs to somehow be coordinated between threads then they may all spend them time contending for the state object. Reducing this dependence on shared state will then improve your multi-threaded performance.
There are also more efficient ways to update a state variable which can avoid locks. Something like the following pattern is used a lot:
private AtomicReference<State> sharedState;
...
// inside a thread processing loop
// do the processing job
while (true) {
State existingState = sharedState.get();
// create a new state object from the existing and our processing
State newState = updateState(state);
// if the application state hasn't changed, then update it
if (sharedState.compareAndSet(existingState, newState)) {
break;
}
// otherwise we need to get the new existing state and try again
}
One way to handle state changes is to have a coordinating thread. It is the only thread which reads from the state and generates jobs. As jobs finish they put updates to the state on a BlockingQueue which is then read by the coordinating thread which updates the state in turn. Then the processing threads don't have to all be contending for access to the shared state.

Imagine it this way :
Synchronization is blocking
Concurrency is parallelization
You don't have to use synchronization. You can use an Atomic reference object as a wrapper for your shared mutable state.
You can also use stamped locks which improves concurrency by allowing for optimistic reads. You may also use Accumulators to write concurrent code. These features are part of Java 8.
Another way to prevent synchronization is to use immutable objects which can be shared and published freely and need no synchronization. I should add that you should use immutable objects anyway regardless of concurrency for that makes your state space of the object easier to reason about

Concurrency design principles in practice

I have a Results object which is written to by several threads concurrently. However, each thread has a specific purpose and owns certain fields, so that no data is actually modified by more than one thread. The consumer of this data will not try to read it until all of the writer threads are done writing it. Because I know this to be true, there is no synchronization on the data writes and reads.
There is a RunningState object associated with this Results object which serves to coordinate this work. All of its methods are synchronized. When a thread is done with its work on this Results object, it calls done() on the RunningState object, which does the following: decrements a counter, checks if the counter has gone to 0 (indicating that all writers are done), and if so, puts this object on a concurrent queue. That queue is consumed by a ResultsStore which reads all of the fields and stores data in the database. Before reading any data, the ResultsStore calls RunningState.finalizeResult(), which is an empty method whose sole purpose is to synchronize on the RunningState object, to ensure that writes from all of the threads are visible to the reader.
Here are my concerns:
1) I believe that this will work correctly, but I feel like I'm violating good design principles to not synchronize on the data modifications to an object that is shared by multiple threads. However, if I were to add synchronization and/or split things up so each thread only saw the data it was responsible for, it would complicate the code. Anyone who modifies this area had better understand what's going on in any case or they're likely to break something, so from a maintenance standpoint I think the simpler code with good comments explaining how it works is a better way to go.
2) The fact that I need to call this do-nothing method seems like an indication of wrong design. Is it?
Opinions appreciated.

This seems mostly right, if a bit fragile (if you change the thread-local nature of one field, for instance, you may forget to synchronize it and end up with hard-to-trace data races).
The big area of concern is in memory visibility; I don't think you've established it. The empty finalizeResult() method may be synchronized, but if the writer threads didn't also synchronize on whatever it synchronizes on (presumably this?), there's no happens-before relationship. Remember, synchronization isn't absolute -- you synchronize relative to other threads that are also synchronized on the same object. Your do-nothing method will indeed do nothing, not even ensure any memory barrier.
You somehow need to establish a happens-before relationship between each thread doing its writes, and the thread that eventually reads. One way to do this without synchronization is via a volatile variable, or an AtomicInteger (or other atomic classes).
For instance, each writer thread can invoke counter.incrementAndGet(1) on the object, and the reading thread can then check that counter.get() == THE_CORRECT_VALUE. There's a happens-before relationship between a volatile/atomic field being written and it being read, which gives you the needed visibility.

Your design is sound, but it can be improved if you are using a true concurrent queue since a concurrent queue from the java.util.concurrent package already guarantees a happens before relationship between the thread putting an item into the queue, and the thread taking an item out, so this precludes needing to call finalizeResult() in the taking thread (so no need for that "do nothing" method call).
From java.util.concurrent package description:
The methods of all classes in java.util.concurrent and its subpackages
extend these guarantees to higher-level synchronization. In
particular:
Actions in a thread prior to placing an object into any
concurrent collection happen-before actions subsequent to the access
or removal of that element from the collection in another thread.
The comments in another answer concerning using an AtomicInteger instead of synchronization are also wise (as using an AtomicInteger to do your thread counting will likely perform better than synchronization), just make sure to get the value of the count after the atomic decrement (e.g. decrementAndGet()) when comparing to 0 in order to avoid adding to the queue twice.

What you've described is indeed safe, but it also sounds, frankly, brittle and (as you note) maintenance could become an issue. Without sample code, it's really hard to tell what's really easiest to understand, so an already subjective question becomes frankly unanswerable. Could you ask a coworker for a code review? (Particularly one that's likely to have to deal with this pattern.) I'm going to trust you that this is indeed the simplest approach, but doing something like wrapping synchronized blocks around writes would increase safety now and in the future. That said, you obviously know your code better than I do.

How do I pull data from another thread or process (Android/Java)

I know of concepts that allow inter-process communication. My program needs to launch a second thread. I know how to pass or "push" data from one thread to another from Java/Android, but I have not seen a lot of information regarding "pulling" data. The child thread needs to grab data on the parent thread every so often. How is this done?

Since threads share memory you can just use a thread safe data structure. Refer to java.util.concurrent for some. Everything in that package is designed for multi threaded situations.
In your case you might want to use a LinkedBlockingQueue. This way the parent thread can put things into the queue, and the child thread can grab it off whenever it likes. It also allows the child thread to block if the Queue is empty.

You may be confusing threads and data. Threads are lines of code execution which may operate on some data but they are not data themselves and they do not contain data. Data is contained in memory and threads are executed by CPU (or vm or whatever level you choose).
You access data in the same way whether it is done in threads or not. That is you use variables or object fields etc. But with threads you need to make sure that there are no race conditions which happen when threads concurrently access the same data.
To summarize, if you have an object that has some method executed by thread, you can still get data from this object in regular way as long as you make sure that only one thread does it at the same time.

Multithreading - StringBuffer and StringBuilder

If your text can change and will only
be accessed from a single thread, use
a StringBuilder because StringBuilder
is unsynchronized.
If your text can changes, and will be
accessed from multiple threads, use a
StringBuffer because StringBuffer is
synchronous.
What does it mean by multiple threads? Can anyone explain me over this? I mean is it something two methods or two programs trying to access another method at same time.

Threads are paths of execution that can be executed concurrently. You can have multiple threads in your Java program, which can call the same method of the same object at the same time. If the method e.g. prints something on screen, you might see the messages coming from different threads jumbled up - unless you explicitly ensure that only one message can be printed out at a time, and all other requests to print shall wait until the actual message is fully printed.
Or, if you have a field in that object, all threads see it. And if one of them modifies the field... that's when the interesting part begins :-) Other threads may only see the updated value at a later time, or not at all, unless you specifically ensure that it is safe to use by multiple threads. This can result in subtle, hard to reproduce bugs. This is why writing concurrent programs correctly is a difficult task.
On machines with a single processor core, only a single thread can run at any time, thus different threads are executed one after another, but the OS switches between them frequently (many times per second), thus giving the user the illusion of seeing multiple threads running in parallel. OTOH multicore machines can really run several threads at the same time - as many as processor cores they have.
Every Java program has at least one thread. You may manually create additional threads within your program and pass them tasks to execute.
A detailed explanation of threads and processes - and further, concurrency in Java - can be found in the Java Tutorials.

Threads are like to little process.
Consider the case a string is shared between two threads which are running concurrently.
Both of them operating on it. So it will be the case where the String is under manipulation by both of the thread so it won't remain in consistent state.
So.
StringBuffer is designed to be thread-safe and all public methods in StringBuffer are synchronized. StringBuilder does not handle thread-safety issue and none of its methods is synchronized.
StringBuilder has better performance than StringBuffer under most circumstances.
Use the new StringBuilder wherever possible.
For more on concurrency refer this

I think you can use StringBuilder in both cases, but be very aware in multithreaded programs. Synchronization at StringBuffer method level is not useful when you must do more operations on such string (think of it like on database transactions) like delete 3 chars at beginning, then delete 3 chars at end, and compare it with something. Even when such delete operations are synchronized (thus atomic) you can have:
first thread can get such string, and delete 3 chars at beginning
second thread get such string and delete 3 chars at beginning
string is not in consistent state (6 chars deleted from beginning)
You should synchronize access to such variables on your method level, not relying on StringBuffer method synchronization. Using StringBuffer you will have two levels of synchronizations while with StringBuilder you will have only your own synchronization.

Mu;ltiple threads is like running parts of the same program at the same time sharing the same data.
A typical example is that when the program needs to do a long calculation, it can create a separate thread to do the calculation in the background and keep reacting to user input on the main thread.
The problem with multiple threads is that since they are running at the same time, and you do not really now what they are doing, since they can make their own decisions, it becomes dngerous to rely that certain actions on the shared data are always done in a certain order.
There a re various techniques of deqling with thqt, one is the synchronize key to qllow synchronous access. This means that one thread blocks access to an Object while it is busy so when the other threads want to get access, they have to wait.
So that's what meant with that StringBuffer is synchronous, it will block access to toher threads when one thread is updating it.
Using multiple threads is considered an advanced topic and not all problems have been solved in a satisfactory manner. Relying on 'synchronous' objects to deal with concurrency will not get you very far, because typically you will do updates to multiple objects in a coordinated manner and these must also be synchronized.
My advice : stay away from there until you've read a good book and experimented on exercises. Till then share no data between threads (other than the simplest of signaling flags).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.