Multithreading - StringBuffer and StringBuilder

Multithreading - StringBuffer and StringBuilder - java

If your text can change and will only
be accessed from a single thread, use
a StringBuilder because StringBuilder
is unsynchronized.
If your text can changes, and will be
accessed from multiple threads, use a
StringBuffer because StringBuffer is
synchronous.
What does it mean by multiple threads? Can anyone explain me over this? I mean is it something two methods or two programs trying to access another method at same time.

Threads are paths of execution that can be executed concurrently. You can have multiple threads in your Java program, which can call the same method of the same object at the same time. If the method e.g. prints something on screen, you might see the messages coming from different threads jumbled up - unless you explicitly ensure that only one message can be printed out at a time, and all other requests to print shall wait until the actual message is fully printed.
Or, if you have a field in that object, all threads see it. And if one of them modifies the field... that's when the interesting part begins :-) Other threads may only see the updated value at a later time, or not at all, unless you specifically ensure that it is safe to use by multiple threads. This can result in subtle, hard to reproduce bugs. This is why writing concurrent programs correctly is a difficult task.
On machines with a single processor core, only a single thread can run at any time, thus different threads are executed one after another, but the OS switches between them frequently (many times per second), thus giving the user the illusion of seeing multiple threads running in parallel. OTOH multicore machines can really run several threads at the same time - as many as processor cores they have.
Every Java program has at least one thread. You may manually create additional threads within your program and pass them tasks to execute.
A detailed explanation of threads and processes - and further, concurrency in Java - can be found in the Java Tutorials.

Threads are like to little process.
Consider the case a string is shared between two threads which are running concurrently.
Both of them operating on it. So it will be the case where the String is under manipulation by both of the thread so it won't remain in consistent state.
So.
StringBuffer is designed to be thread-safe and all public methods in StringBuffer are synchronized. StringBuilder does not handle thread-safety issue and none of its methods is synchronized.
StringBuilder has better performance than StringBuffer under most circumstances.
Use the new StringBuilder wherever possible.
For more on concurrency refer this

I think you can use StringBuilder in both cases, but be very aware in multithreaded programs. Synchronization at StringBuffer method level is not useful when you must do more operations on such string (think of it like on database transactions) like delete 3 chars at beginning, then delete 3 chars at end, and compare it with something. Even when such delete operations are synchronized (thus atomic) you can have:
first thread can get such string, and delete 3 chars at beginning
second thread get such string and delete 3 chars at beginning
string is not in consistent state (6 chars deleted from beginning)
You should synchronize access to such variables on your method level, not relying on StringBuffer method synchronization. Using StringBuffer you will have two levels of synchronizations while with StringBuilder you will have only your own synchronization.

Mu;ltiple threads is like running parts of the same program at the same time sharing the same data.
A typical example is that when the program needs to do a long calculation, it can create a separate thread to do the calculation in the background and keep reacting to user input on the main thread.
The problem with multiple threads is that since they are running at the same time, and you do not really now what they are doing, since they can make their own decisions, it becomes dngerous to rely that certain actions on the shared data are always done in a certain order.
There a re various techniques of deqling with thqt, one is the synchronize key to qllow synchronous access. This means that one thread blocks access to an Object while it is busy so when the other threads want to get access, they have to wait.
So that's what meant with that StringBuffer is synchronous, it will block access to toher threads when one thread is updating it.
Using multiple threads is considered an advanced topic and not all problems have been solved in a satisfactory manner. Relying on 'synchronous' objects to deal with concurrency will not get you very far, because typically you will do updates to multiple objects in a coordinated manner and these must also be synchronized.
My advice : stay away from there until you've read a good book and experimented on exercises. Till then share no data between threads (other than the simplest of signaling flags).

Related

in multithreaded programming does synchronization strip the benefits of concurrent executions

I have a dilemma regarding the use of multithreading in the application I am working on. I have a workflow in which the state of the object changes, which presents no issues for single-threaded operation. However, In order to improve the performance, I am planning to use multiple threads.
It is my understanding that since the state is going to be shared among the threads, every thread must acquire a lock on the state before execution, so doesn't this defeat the purpose of multithreading? It seems like multiple threads won't produce any actual concurrency, so it wouldn't be any better than single threaded.
Is my analysis correct? If I am misunderstanding then would someone please clarify the concept?

The short answer: concurrency is hard. Real concurrency, with multiple concurrent writers, is really hard.
What you need to determine is what your actual consistency guarantees need to be. Does every reader need to be able to see every write, guaranteed? Then you'll be forced into linearizing all the threads somehow (e.g. using locks) -- your next effort should be to ensure you do as much work as possible outside of the lock, to keep the lock held for the shortest possible time.
One way to keep the lock held for the shortest possible time is to use a lock-free algorithm. Most lock-free algorithms are based on an atomic compare-and-set primitive, such as those provided by the java.util.concurrent.atomic package. These can be very high-performance, but designing a successful lock-free algorithm can be subtle. One simple kind of lock-free algorithm is to just build a new (immutable) state object and then atomically make it the "live" state, retrying in a loop if a different state was made live by another writer in the interim. (This approach is good enough for many applications, but it's vulnerable to livelock if you have too many writers.)
If you can get by with a looser consistency guarantee, then many other optimizations are possible. For example, you can use thread-local caches so that each thread sees its own view of the data and can be writing in parallel. Then you need to deal with the consequences of data being stale or inconsistent. Most techniques in this vein strive for eventual consistency: writes may not be visible to all readers immediately, but they are guaranteed to be visible to all readers eventually.
This is an active area of research, and a complete answer could fill a book (really, several books!). If you're just getting started in this area, I'd recommend you read Java Concurrency in Practice by Goetz et al, as it provides a good introduction to the subject and lots of practical advice about how to successfully build concurrent systems.

Your interpretation of the limits of multithreading and concurrency are correct. Since the state must be acquired and controlled by threads in order for them to perform work (and waiting when not working), you are essentially splitting the work of a single thread among multiple threads.
The best way to fix this is to adjust your program design to limit the size of the critical section. As we learned in my operating systems course with process synchronization,
only one critical section must be executing at any given time
The specific term critical section may not directly apply to Java concurrency, but it still illustrates the concept.
What does it mean to limit this critical section? For example, let's say you have a program managing a single bank account (unrealistic, but illustrates my point). If a lock on the account must be acquired by a thread for the balance to be updated, the basic option would be to have a single thread working on updating the balance at all times (without concurrency). The critical section would be the entire program. However, let's say there was also other logic to be executed, such as alerting other banks of the balance update. You could require the lock on the bank account state only while updating the balance, and not when alerting other banks, decreasing the size of critical section and allowing other threads to perform work by alerting other banks while one thread is updating the balance.
Please comment if this was unclear. Your seem to already understand the constraints of concurrency, but hopefully this will reveal possible steps towards implementing concurrency.

Your need is not totally clear but you guess well the limitations that multi threading may have.
Running parallel threads have a sense if some "relatively autonomous" tasks can be concurrently performed by distinct threads or group of threads.
If your scenario looks like : you start 5 threads and finally only a single thread is active while the others are waiting for a locking resource, using multithreading makes no sense and could even introduce an overhead because of cpu context switches.
I think that in your use case, the multithreading could be used for :
tasks that don't change the state
performing a task that changes the state if the task may be divided in multiple processing with a minimal set of instructions that may do profitable the multithreading use.

It is my understanding that since the state is going to be shared among the threads, every thread must acquire a lock on the state before execution, so doesn't this defeat the purpose of multithreading?
The short answer is "it depends". It is rare that you have a multithreaded application that has no shared data. So sharing data, even if it needs a full lock, doesn't necessarily defeat the performance improvements when making a single threaded application be multi-threaded.
The big question is what the frequency that the state needs to be updated by each thread. If the threads read in the state, do their concurrent processing which takes time, and then alter the state at the end then you may see performance gains. On the other hand, if every step in the processing needs to somehow be coordinated between threads then they may all spend them time contending for the state object. Reducing this dependence on shared state will then improve your multi-threaded performance.
There are also more efficient ways to update a state variable which can avoid locks. Something like the following pattern is used a lot:
private AtomicReference<State> sharedState;
...
// inside a thread processing loop
// do the processing job
while (true) {
State existingState = sharedState.get();
// create a new state object from the existing and our processing
State newState = updateState(state);
// if the application state hasn't changed, then update it
if (sharedState.compareAndSet(existingState, newState)) {
break;
}
// otherwise we need to get the new existing state and try again
}
One way to handle state changes is to have a coordinating thread. It is the only thread which reads from the state and generates jobs. As jobs finish they put updates to the state on a BlockingQueue which is then read by the coordinating thread which updates the state in turn. Then the processing threads don't have to all be contending for access to the shared state.

Imagine it this way :
Synchronization is blocking
Concurrency is parallelization
You don't have to use synchronization. You can use an Atomic reference object as a wrapper for your shared mutable state.
You can also use stamped locks which improves concurrency by allowing for optimistic reads. You may also use Accumulators to write concurrent code. These features are part of Java 8.
Another way to prevent synchronization is to use immutable objects which can be shared and published freely and need no synchronization. I should add that you should use immutable objects anyway regardless of concurrency for that makes your state space of the object easier to reason about

Concurrency design principles in practice

I have a Results object which is written to by several threads concurrently. However, each thread has a specific purpose and owns certain fields, so that no data is actually modified by more than one thread. The consumer of this data will not try to read it until all of the writer threads are done writing it. Because I know this to be true, there is no synchronization on the data writes and reads.
There is a RunningState object associated with this Results object which serves to coordinate this work. All of its methods are synchronized. When a thread is done with its work on this Results object, it calls done() on the RunningState object, which does the following: decrements a counter, checks if the counter has gone to 0 (indicating that all writers are done), and if so, puts this object on a concurrent queue. That queue is consumed by a ResultsStore which reads all of the fields and stores data in the database. Before reading any data, the ResultsStore calls RunningState.finalizeResult(), which is an empty method whose sole purpose is to synchronize on the RunningState object, to ensure that writes from all of the threads are visible to the reader.
Here are my concerns:
1) I believe that this will work correctly, but I feel like I'm violating good design principles to not synchronize on the data modifications to an object that is shared by multiple threads. However, if I were to add synchronization and/or split things up so each thread only saw the data it was responsible for, it would complicate the code. Anyone who modifies this area had better understand what's going on in any case or they're likely to break something, so from a maintenance standpoint I think the simpler code with good comments explaining how it works is a better way to go.
2) The fact that I need to call this do-nothing method seems like an indication of wrong design. Is it?
Opinions appreciated.

This seems mostly right, if a bit fragile (if you change the thread-local nature of one field, for instance, you may forget to synchronize it and end up with hard-to-trace data races).
The big area of concern is in memory visibility; I don't think you've established it. The empty finalizeResult() method may be synchronized, but if the writer threads didn't also synchronize on whatever it synchronizes on (presumably this?), there's no happens-before relationship. Remember, synchronization isn't absolute -- you synchronize relative to other threads that are also synchronized on the same object. Your do-nothing method will indeed do nothing, not even ensure any memory barrier.
You somehow need to establish a happens-before relationship between each thread doing its writes, and the thread that eventually reads. One way to do this without synchronization is via a volatile variable, or an AtomicInteger (or other atomic classes).
For instance, each writer thread can invoke counter.incrementAndGet(1) on the object, and the reading thread can then check that counter.get() == THE_CORRECT_VALUE. There's a happens-before relationship between a volatile/atomic field being written and it being read, which gives you the needed visibility.

Your design is sound, but it can be improved if you are using a true concurrent queue since a concurrent queue from the java.util.concurrent package already guarantees a happens before relationship between the thread putting an item into the queue, and the thread taking an item out, so this precludes needing to call finalizeResult() in the taking thread (so no need for that "do nothing" method call).
From java.util.concurrent package description:
The methods of all classes in java.util.concurrent and its subpackages
extend these guarantees to higher-level synchronization. In
particular:
Actions in a thread prior to placing an object into any
concurrent collection happen-before actions subsequent to the access
or removal of that element from the collection in another thread.
The comments in another answer concerning using an AtomicInteger instead of synchronization are also wise (as using an AtomicInteger to do your thread counting will likely perform better than synchronization), just make sure to get the value of the count after the atomic decrement (e.g. decrementAndGet()) when comparing to 0 in order to avoid adding to the queue twice.

What you've described is indeed safe, but it also sounds, frankly, brittle and (as you note) maintenance could become an issue. Without sample code, it's really hard to tell what's really easiest to understand, so an already subjective question becomes frankly unanswerable. Could you ask a coworker for a code review? (Particularly one that's likely to have to deal with this pattern.) I'm going to trust you that this is indeed the simplest approach, but doing something like wrapping synchronized blocks around writes would increase safety now and in the future. That said, you obviously know your code better than I do.

How do I pull data from another thread or process (Android/Java)

I know of concepts that allow inter-process communication. My program needs to launch a second thread. I know how to pass or "push" data from one thread to another from Java/Android, but I have not seen a lot of information regarding "pulling" data. The child thread needs to grab data on the parent thread every so often. How is this done?

Since threads share memory you can just use a thread safe data structure. Refer to java.util.concurrent for some. Everything in that package is designed for multi threaded situations.
In your case you might want to use a LinkedBlockingQueue. This way the parent thread can put things into the queue, and the child thread can grab it off whenever it likes. It also allows the child thread to block if the Queue is empty.

You may be confusing threads and data. Threads are lines of code execution which may operate on some data but they are not data themselves and they do not contain data. Data is contained in memory and threads are executed by CPU (or vm or whatever level you choose).
You access data in the same way whether it is done in threads or not. That is you use variables or object fields etc. But with threads you need to make sure that there are no race conditions which happen when threads concurrently access the same data.
To summarize, if you have an object that has some method executed by thread, you can still get data from this object in regular way as long as you make sure that only one thread does it at the same time.

Parallel-processing in Java; advice needed i.e. on Runnanble/Callable interfaces

Assume that I have a set of objects that need to be analyzed in two different ways, both of which take relatively long time and involve IO-calls, I am trying to figure out how/if I could go about optimizing this part of my software, especially utilizing the multiple processors (the machine i am sitting on for ex is a 8-core i7 which almost never goes above 10% load during execution).
I am quite new to parallel-programming or multi-threading (not sure what the right term is), so I have read some of the prior questions, particularly paying attention to highly voted and informative answers. I am also in the process of going through the Oracle/Sun tutorial on concurrency.
Here's what I thought out so far;
A thread-safe collection holds the objects to be analyzed
As soon as there are objects in the collection (they come a couple at a time from a series of queries), a thread per object is started
Each specific thread takes care of the initial pre-analysis preparations; and then calls on the analyses.
The two analyses are implemented as Runnables/Callables, and thus called on by the thread when necessary.
And my questions are:
Is this a reasonable scheme, if not, how would you go about doing this?
In order to make sure things don't get out of hand, should I implement a ThreadManager or some thing of that sort, which starts and stops threads, and re-distributes them when they are complete? For example, if i have 256 objects to be analyzed, and 16 threads in total, the ThreadManager assigns the first finished thread to the 17th object to be analyzed etc.
Is there a dramatic difference between Runnable/Callable other than the fact that Callable can return a result? Otherwise should I try to implement my own interface, in that case why?
Thanks,

You could use a BlockingQueue implementation to hold your objects and spawn your threads from there. This interface is based on the producer-consumer principle. The put() method will block if your queue is full until there is some more space and the take() method will block if the queue is empty until there are some objects again in the queue.
An ExecutorService can help you manage your pool of threads.
If you are awaiting a result from your spawned threads then Callable interface is a good idea to use since you can start the computation earlier and work in your code assuming the results in Future-s. As far as the differencies with the Runnable interface, from the Callable javadoc:
The Callable interface is similar to Runnable, in that both are designed for classes whose instances are potentially executed by another thread. A Runnable, however, does not return a result and cannot throw a checked exception.
Some general things you need to consider in your quest for java concurrency:
Visibility is not coming by defacto. volatile, AtomicReference and other objects in the java.util.concurrent.atomic package are your friends.
You need to carefully ensure atomicity of compound actions using synchronization and locks.

Your idea is basically sound. However, rather than creating threads directly, or indirectly through some kind of ThreadManager of your own design, use an Executor from Java's concurrency package. It does everything you need, and other people have already taken the time to write and debug it. An executor manages a queue of tasks, so you don't need to worry about providing the threadsafe queue yourself either.
There's no difference between Callable and Runnable except that the former returns a value. Executors will handle both, and ready them the same.
It's not clear to me whether you're planning to make the preparation step a separate task to the analyses, or fold it into one of them, with that task spawning the other analysis task halfway through. I can't think of any reason to strongly prefer one to the other, but it's a choice you should think about.

The Executors provides factory methods for creating thread pools. Specifically Executors#newFixedThreadPool(int nThreads) creates a thread pool with a fixed size that utilizes an unbounded queue. Also if a thread terminates due to a failure then a new thread will be replaced in its place. So in your specific example of 256 tasks and 16 threads you would call
// create pool
ExecutorService threadPool = Executors.newFixedThreadPool(16);
// submit task.
Runnable task = new Runnable(){};;
threadPool.submit(task);
The important question is determining the proper number of threads for you thread pool. See if this helps Efficient Number of Threads

Sounds reasonable, but it's not as trivial to implement as it may seem.
Maybe you should check the jsr166y project.
That's probably the easiest solution to your problem.

Disadvantage of synchronized methods in Java

What are the disadvantages of making a large Java non-static method synchronized? Large method in the sense it will take 1 to 2 mins to complete the execution.

If you synchronize the method and try to call it twice at the same time, one thread will have to wait two minutes.
This is not really a question of "disadvantages". Synchronization is either necessary or not, depending on what the method does.
If it is critical that the code runs only once at the same time, then you need synchronization.
If you want to run the code only once at the same time to preserve system resources, you may want to consider a counting Semaphore, which gives more flexibility (such as being able to configure the number of concurrent executions).
Another interesting aspect is that synchronization can only really be used to control access to resources within the same JVM. If you have more than one JVM and need to synchronize access to a shared file system or database, the synchronized keyword is not at all sufficient. You will need to get an external (global) lock for that.

If the method takes on the order of minutes to execute, then it may not need to be synchronized at such a coarse level, and it may be possible to use a more fine-grained system, perhaps by locking only the portion of a data structure that the method is operating on at the moment. Certainly, you should try to make sure that your critical section isn't really 2 minutes long - any method that takes that long to execute (regardless of the presence of other threads or locks) should be carefully studied as a candidate for parallelization. For a computation this time-consuming, you could be acquiring and releasing hundreds of locks and still have it be negligible. (Or, to put it another way, even if you need to introduce a lot of locks to parallelize this code, the overhead probably won't be significant.)

Since your method takes a huge amount of time to run, the relatively tiny amount of time it takes to acquire the synchronized lock should not be important.
A bigger problem could appear if your program is multithreaded (which I'm assuming it is, since you're making the method synchronized), and more than one thread needs to access that method, it could become a bottleneck. To prevent this, you might be able to rewrite the method so that it does not require synchronization, or use a synchronized block to reduce the size of the protected code (in general, the smaller the amount of code that is protected by the synchronize keyword, the better).
You can also look at the java.util.concurrent classes, as you may find a better solution there as well.

If the object is shared by multiple threads, if one thread tries to call the synchronized method on the object while another's call is in progress, it will be blocked for 1 to 2 minutes. In the worst case, you could end up with a bottleneck where the throughput of your system is dominated by executing these computations one at a time.
Whether this is a problem or not depends on the details of your application, but you probably should look at more fine-grained synchronization ... if that is practical.

In simple two lines Disadvantage of synchronized methods in Java :
Increase the waiting time of the thread
Create performance problem

First drawback is that threads that are blocked waiting to execute synchronize code can't be interrupted.Once they're blocked their stuck there, until they get the lock for the object the code is synchronizing on.
Second drawback is that the synchronized block must be within the same method in other words we can't start a synchronized block in one method and end the syncronized block in another for obvious reasons.
The third drawback is that we can't test to see if an object's intrinsic lock is available or find out any other information about the lock also if the lock isn't available we can't timeout after we waited lock for a while. When we reach the beginning of a synchronized block we can either get the lock and continue executing or block at that line of code until we get the lock.
The fourth drawback is that if multiple threads are awaiting to get lock, it's not first come first served. There isn't set order in which the JVM will choose the next thread that gets the lock, so the first thread that blocked could be the last thread to get the lock and vice Versa.
so instead of using synchronization we can prevent thread interference using classes that implement the java.util.concurrent locks.lock interface.

In simple two lines Disadvantage of synchronized methods in Java :
1. Increase the waiting time of the thread
2. Create a performance problem

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.