In an effort to practice my rusty Java, I wanted to try a simple multi-threaded shared data example and I came across something that surprised me.
Basically we have a shared AtomicInteger counter between three threads that each take turns incrementing and printing the counter.
main
AtomicInteger counter = new AtomicInteger(0);
CounterThread ct1 = new CounterThread(counter, "A");
CounterThread ct2 = new CounterThread(counter, "B");
CounterThread ct3 = new CounterThread(counter, "C");
ct1.start();
ct2.start();
ct3.start();
CounterThread
public class CounterThread extends Thread
{
private AtomicInteger _count;
private String _id;
public CounterThread(AtomicInteger count, String id)
{
_count = count;
_id = id;
}
public void run()
{
while(_count.get() < 1000)
{
System.out.println(_id + ": " + _count.incrementAndGet());
Thread.yield();
}
}
}
I expected that when each thread executed Thread.yield(), that it would give over execution to another thread to increment _count like this:
A: 1
B: 2
C: 3
A: 4
...
Instead, I got output where A would increment _count 100 times, then pass it off to B. Sometimes all three threads would take turns consistently, but sometimes one thread would dominate for several increments.
Why doesn't Thread.yield() always yield processing over to another thread?
I expected that when each thread executed Thread.yield(), that it would give over execution to another thread to increment _count like this:
In threaded applications that are spinning, predicting the output is extremely hard. You would have to do a lot of work with locks and stuff to get perfect A:1 B:2 C:3 ... type output.
The problem is that everything is a race condition and unpredictable due to hardware, race-conditions, time-slicing randomness, and other factors. For example, when the first thread starts, it may run for a couple of millis before the next thread starts. There would be no one to yield() to. Also, even if it yields, maybe you are on a 4 processor box so there is no reason to pause any other threads at all.
Instead, I got output where A would increment _count 100 times, then pass it off to B. Sometimes all three threads would take turns consistently, but sometimes one thread would dominate for several increments.
Right, in general with this spinning loops, you see bursts of output from a single thread as it gets time slices. This is also confused by the fact that System.out.println(...) is synchronized which affects the timing as well. If it was not doing a synchronized operation, you would see even more bursty output.
Why doesn't Thread.yield() always yield processing over to another thread?
I very rarely use Thread.yield(). It is a hint to the scheduler at best and probably is ignored on some architectures. The idea that it "pauses" the thread is very misleading. It may cause the thread to be put back to the end of the run queue but there is no guarantee that there are any threads waiting so it may keep running as if the yield were removed.
See my answer here for more info : unwanted output in multithreading
Let's read some javadoc, shall we?
A hint to the scheduler that the current thread is willing to yield
its current use of a processor. The scheduler is free to ignore this
hint.
[...]
It is rarely appropriate to use this method. It may be useful
for debugging or testing purposes, where it may help to reproduce bugs
due to race conditions. It may also be useful when designing
concurrency control constructs such as the ones in the
java.util.concurrent.locks package.
You cannot guarantee that another thread will obtain the processor after a yield(). It's up to the scheduler and it seems he/she doesn't want to in your case. You might consider sleep()ing instead, for testing.
Related
I have a String and ThreadPoolExecutor that changes the value of this String. Just check out my sample:
String str_example = "";
ThreadPoolExecutor poolExecutor = new ThreadPoolExecutor(10, 30, (long)10, TimeUnit.SECONDS, runnables);
for (int i = 0; i < 80; i++){
poolExecutor.submit(new Runnable() {
#Override
public void run() {
try {
Thread.sleep((long) (Math.random() * 1000));
String temp = str_example + "1";
str_example = temp;
System.out.println(str_example);
} catch (Exception e) {
e.printStackTrace();
}
}
});
}
so after executing this, i get something like that:
1
11
111
1111
11111
.......
So question is: i just expect the result like this if my String object has volatile modifier. But i have the same result with this modifier and without.
There are several reasons why you see "correct" execution.
First, CPU designers do as much as they can so that our programs run correctly even in presence of data races. Cache coherence deals with cache lines and tries to minimize possible conflicts. For example, only one CPU can write to a cache line at some point of time. After write was done other CPUs should request that cache line to be able to write to it. Not to say x86 architecture(most probable which you use) is very strict comparing to others.
Second, your program is slow and threads sleep for some random period of time. So they do almost all the work at different points of time.
How to achieve inconsistent behavior? Try something with for loop without any sleep. In that case field value most probably will be cached in CPU registers and some updates will not be visible.
P.S. Updates of field str_example are not atomic so you program may produce the same string values even in presense of volatile keyword.
When you talk about concepts like thread caching, you're talking about the properties of a hypothetical machine that Java might be implemented on. The logic is something like "Java permits an implementation to cache things, so it requires you to tell it when such things would break your program". That does not mean that any actual machine does anything of the sort. In reality, most machines you are likely to use have completely different kinds of optimizations that don't involve the kind of caches that you're thinking of.
Java requires you to use volatile precisely so that you don't have to worry about what kinds of absurdly complex optimizations the actual machine you're working on might or might not have. And that's a really good thing.
Your code is unlikely to exhibit concurrency bugs because it executes with very low concurrency. You have 10 threads, each of which sleep on average 500 ms before doing a string concatenation. As a rough guess, String concatenation takes about 1ns per character, and because your string is only 80 characters long, this would mean that each thread spends about 80 out of 500000000 ns executing. The chance of two or more threads running at the same time is therefore vanishingly small.
If we change your program so that several threads are running concurrently all the time, we see quite different results:
static String s = "";
public static void main(String[] args) throws Exception {
ExecutorService executor = Executors.newFixedThreadPool(5);
for (int i = 0; i < 10_000; i ++) {
executor.submit(() -> {
s += "1";
});
}
executor.shutdown();
executor.awaitTermination(1, TimeUnit.MINUTES);
System.out.println(s.length());
}
In the absence of data races, this should print 10000. On my computer, this prints about 4200, meaning over half the updates have been lost in the data race.
What if we declare s volatile? Interestingly, we still get about 4200 as a result, so data races were not prevented. That makes sense, because volatile ensures that writes are visible to other threads, but does not prevent intermediary updates, i.e. what happens is something like:
Thread 1 reads s and starts making a new String
Thread 2 reads s and starts making a new String
Thread 1 stores its result in s
Thread 2 stores its result in s, overwriting the previous result
To prevent this, you can use a plain old synchronized block:
executor.submit(() -> {
synchronized (Test.class) {
s += "1";
}
});
And indeed, this returns 10000, as expected.
It is working because you are using Thread.sleep((long) (Math.random() * 100));So every thread has different sleep time and executing may be one by one as all other thread in sleep mode or completed execution.But though your code is working is not thread safe.Even if you use Volatile also will not make your code thread safe.Volatile only make sure visibility i.e when one thread make some changes other threads are able to see it.
In your case your operation is multi step process reading the variable,updating then writing to memory.So you required locking mechanism to make it thread safe.
In python using two Threads for a simple counter program (as demonstrated below) is slower than the program with a single thread. The reason given to this is the mechanism behind Global Interpreter lock.
I tested the same in java to see the performance. Here again, I see that a single Thread out-performs two-threaded one with a significant time scale. why is it so?
Here is the code:
public class ThreadTiming {
static void threadMessage(String message) {
String threadName =
Thread.currentThread().getName();
System.out.format("%s: %s%n",
threadName,
message);
}
private static class Counter implements Runnable {
private int count=500000000;
#Override
public void run() {
while(count>0) {
count--;
}
threadMessage("done processing");
}
}
public static void main(String[] args) throws InterruptedException{
Thread t1 = new Thread(new Counter());
Thread t2 = new Thread(new Counter());
long startTime=System.currentTimeMillis();
t1.start();
t2.start();
t1.join();
t2.join();
long endTime=System.currentTimeMillis();
System.out.println("Time taken by two threads "+ (endTime-startTime)/1000.0);
startTime=System.currentTimeMillis();
Calculate(2*500000000);
endTime=System.currentTimeMillis();
System.out.println("Time taken by single thread "+ (endTime-startTime)/1000.0);
}
public static void Calculate(int x){
while (x>0){
x--;
}
threadMessage("Done processing");
}
}
Output:
Thread-1: done processing
Thread-2: done processing
Time taken by two threads 0.052
main: Done processing
Time taken by single thread 0.0010
Very simple. The single threaded version uses a local variable which hotspot has no problems to reason that it never leaves the scope, hence the whole function is reduced to a nop.
On the other hand proving that the instance variable never leaves scope (hello reflection!) Is much harder and obviously hotspot cannot it here hence the loop isn't removed.
On a general note benchmarking is hard (i count at least three other mistakes that could lead to "wrong" results) and requires tons of knowledge.You are better off using jmh (java measuring harness) which takes care of most things.
The basic answer is you have code the optimiser can eliminate and you are timing how long it takes to detect this. You are also adding the time it takes to start and stop two threads which could be more than half this time.
The second test doesn't start a new thread, it uses the current one so you just need to wait for it to detect the loop doesn't do anything.
For example you have timed that a single thread can do 1 billion loops in 1 ms. If you have a 3.33 GHz processor, this would have to do 300 iterations in a single clock cycle. If this sounds too good to be true, that is because it is. ;)
#Voo seems to be generally right, as you can see by moving ThreadTiming.Counter.count to be a local variable of ThreadTiming.Counter.run(). That eliminates any possibility of non-local references, and the resulting program exhibits much less single-thread vs. dual-thread performance difference.
HOWEVER, that doesn't eliminate all the difference. The timing reported for the dual-thread case is still worse by about a factor of 9 for me. But if I then swap so that the single-threaded case is measured first, the two-thread case wins by about a factor of 2.
But that, too, is illusory, because the two tests are running different -- albeit similar -- code. The single-thread case can easily be made to run exactly the same code as the dual thread case:
Counter c = new Counter();
c.run();
c.run();
(Using the version where count is local to run().) If that approach is used then I observe no difference in performance (at the resolution of the measurement) between single- and dual-threaded, regardless of which case is tested first.
As #Voo said, benchmarking is hard.
It just looks like it's from loading each thread and its context into the CPU. It's thrashing. There's probably a more detailed answer waiting to strike, but let's start by posting the basics...
When running two threads, your timer is including the time taken to launch the two threads. Creating and starting threads has some overhead, and in this case, the overhead is longer than the time to actually carry out the process.
I researched the concept of a thread and saw that it is to have code run in two processes at the same time. Heres my code though
public class Connor extends Thread{
public void run() {
for(int i=0; i< 10; i ++){
System.out.println("Hello " + i);
}
public static void main(String[] args){
Connor runner1 = new Connor();
Connor runner2 = new Connor();
runner1.start();
runner2.start();
}
}
And my output http://imgur.com/yAZqgal
It seems like the two threads do start off at the same time(separate processes, as indicated by the two leading 0s) but one executes (1-9) and then the other executes (1-9). Arent they suppose to interweave as well (1,1,2,2,...) bc the threads both print to the console. I researched and saw that start is the right method to use as it tells the thread class to execute the run method in another thread? Can anyone explain why im getting this output?
Say you have ten errands you need to do and your sister has ten errands she needs to do, and you only have one car. Do you bring the car back after each errand and switch drivers? Of course not. That would be absurdly inefficient. Each thread basically just needs the output stream. So it would be absurd to interleave them tightly.
Your code looks fine. I guess that your threads do not run in parallel just because they terminate to fast. Change the loop limit from 10 to 1000 and you will see the effect.
Starting thread itself is relatively heavy operation. You code starts the first thread and then the second one. The first thread once started terminates before the second thread got a chance to start executing its business logic.
In case of Multi-threading there is no guarantee that which thread is allocated for what time to run by the processor and in that case the result is unpredictable and will generate different output for each run.
If you are looking for desired output then you need synchronization block. using wait and notify you can achieve it easily.
Please have a look at below Lesson directly from Oracle official site:
Lesson: Concurrency to read more about concurrency.
Chapter 17. Threads and Locks to read more about thread, locks and synchronization.
Note: wait & notify must be called inside the synchronized block and can call on the same object on which it is synchronized.
Sample code: (read inline comments)
public class Connor extends Thread {
private static Connor runner1 = new Connor();
private static Connor runner2 = new Connor();
public void run() {
for (int i = 0; i < 10; i++) {
System.out.println("Hello " + i);
// wake up another thread to come out from wait state
if (runner1 == this) {
// wake up runner2
synchronized (runner2) {
runner2.notify();
}
} else {
// wake up runner1
synchronized (runner1) {
runner1.notify();
}
}
synchronized (this) {
try {
// say current thread to wait until notify by another thread
this.wait();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
public static void main(String[] args) {
runner1.start();
runner2.start();
}
}
output:
Hello 0
Hello 0
Hello 1
Hello 1
Hello 2
Hello 2
Hello 3
Hello 3
Hello 4
Hello 4
Hello 5
Hello 5
Hello 6
Hello 6
Hello 7
Hello 7
Hello 8
Hello 8
Hello 9
Hello 9
Why my input is showing one thread executing after another thread, not at the same time?
The general explanation is that Thread scheduling is unpredictable. Interleaving may happen ... or it may not. That is fundamental to Java threading. (And indeed to threading in most other languages.)
If you need thread execution to interleave in an entirely predictable way, you need to implement some kind of hand-shake mechanism, where one thread waits for another one to do something, and do on. But that is complicated, and typically defeats the purpose of using threading.
FWIW: #Braj's answer shows how you might implement strict interleaving. However note that this effectively means that only one thread is going to execute at a time. In addition, the waiting / notifying is going to lead to a lot of work for the thread scheduler ... some that the application will run significantly slower than if you had just done the work on one thread.
In this particular example, there are two issues that combine to make any short term interleaving unlikely:
Creating a new native thread in Java is relatively expensive, because it typically entails requesting a memory block from the OS to house the thread stack. That in turn entails the OS "messing around" with page tables, zeroing a memory block and so on.
Native thread scheduling is implemented by the operating system, and it operates at a fairly coarse-grained level ... because that is the most efficient way to operate from the perspective of a typical application. (Switching thread "contexts" in a PC-class machine is relatively expensive operation, and the thread scheduler itself potentially has work to do.)
In your example, the chances are that the first thread can say "hello" ten times before the second thread is ready to be scheduled.
Our application requires all worker threads to synchronize at a defined point. For this we use a CyclicBarrier, but it does not seem to scale well. With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
EDIT: Synchronization happens very frequently, in the order of 100k to 1M times.
If synchronization of many threads is "hard", would it help building a synchronization tree? Thread 1 waits for 2 and 3, which in turn wait for 4+5 and 6+7, respectively, etc.; after finishing, threads 2 and 3 wait for thread 1, thread 4 and 5 wait for thread 2, etc..
1
| \
2 3
|\ |\
4 5 6 7
Would such a setup reduce synchronization overhead? I'd appreciate any advice.
See also this featured question: What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?
With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
Honestly, there's your problem right there. Figure out a performance benchmark and prove that this is the problem, or risk spending hours / days solving the entirely wrong problem.
You are thinking about the problem in a subtly wrong way that tends to lead to very bad coding. You don't want to wait for threads, you want to wait for work to be completed.
Probably the most efficient way is a shared, waitable counter. When you make new work, increment the counter and signal the counter. When you complete work, decrement the counter. If there is no work to do, wait on the counter. If you drop the counter to zero, check if you can make new work.
If I understand correctly, you're trying to break your solution up into parts and solve them separately, but concurrently, right? Then have your current thread wait for those tasks? You want to use something like a fork/join pattern.
List<CustomThread> threads = new ArrayList<CustomThread>();
for (Something something : somethings) {
threads.add(new CustomThread(something));
}
for (CustomThread thread : threads) {
thread.start();
}
for (CustomThread thread : threads) {
thread.join(); // Blocks until thread is complete
}
List<Result> results = new ArrayList<Result>();
for (CustomThread thread : threads) {
results.add(thread.getResult());
}
// do something with results.
In Java 7, there's even further support via a fork/join pool. See ForkJoinPool and its trail, and use Google to find one of many other tutorials.
You can recurse on this concept to get the tree you want, just have the threads you create generate more threads in the exact same way.
Edit: I was under the impression that you wouldn't be creating that many threads, so this is better for your scenario. The example won't be horribly short, but it goes along the same vein as the discussion you're having in the other answer, that you can wait on jobs, not threads.
First, you need a Callable for your sub-jobs that takes an Input and returns a Result:
public class SubJob implements Callable<Result> {
private final Input input;
public MyCallable(Input input) {
this.input = input;
}
public Result call() {
// Actually process input here and return a result
return JobWorker.processInput(input);
}
}
Then to use it, create an ExecutorService with a fix-sized thread pool. This will limit the number of jobs you're running concurrently so you don't accidentally thread-bomb your system. Here's your main job:
public class MainJob extends Thread {
// Adjust the pool to the appropriate number of concurrent
// threads you want running at the same time
private static final ExecutorService pool = Executors.newFixedThreadPool(30);
private final List<Input> inputs;
public MainJob(List<Input> inputs) {
super("MainJob")
this.inputs = new ArrayList<Input>(inputs);
}
public void run() {
CompletionService<Result> compService = new ExecutorCompletionService(pool);
List<Result> results = new ArrayList<Result>();
int submittedJobs = inputs.size();
for (Input input : inputs) {
// Starts the job when a thread is available
compService.submit(new SubJob(input));
}
for (int i = 0; i < submittedJobs; i++) {
// Blocks until a job is completed
results.add(compService.take())
}
// Do something with results
}
}
This will allow you to reuse threads instead of generating a bunch of new ones every time you want to run a job. The completion service will do the blocking while it waits for jobs to complete. Also note that the results list will be in order of completion.
You can also use Executors.newCachedThreadPool, which creates a pool with no upper limit (like using Integer.MAX_VALUE). It will reuse threads if one is available and create a new one if all the threads in the pool are running a job. This may be desirable later if you start encountering deadlocks (because there's so many jobs in the fixed thread pool waiting that sub jobs can't run and complete). This will at least limit the number of threads you're creating/destroying.
Lastly, you'll need to shutdown the ExecutorService manually, perhaps via a shutdown hook, or the threads that it contains will not allow the JVM to terminate.
Hope that helps/makes sense.
If you have a generation task (like the example of processing columns of a matrix) then you may be stuck with a CyclicBarrier. That is to say, if every single piece of work for generation 1 must be done in order to process any work for generation 2, then the best you can do is to wait for that condition to be met.
If there are thousands of tasks in each generation, then it may be better to submit all of those tasks to an ExecutorService (ExecutorService.invokeAll) and simply wait for the results to return before proceeding to the next step. The advantage of doing this is eliminating context switching and wasted time/memory from allocating hundreds of threads when the physical CPU is bounded.
If your tasks are not generational but instead more of a tree-like structure in which only a subset need to be complete before the next step can occur on that subset, then you might want to consider a ForkJoinPool and you don't need Java 7 to do that. You can get a reference implementation for Java 6. This would be found under whatever JSR introduced the ForkJoinPool library code.
I also have another answer which provides a rough implementation in Java 6:
public class Fib implements Callable<Integer> {
int n;
Executor exec;
Fib(final int n, final Executor exec) {
this.n = n;
this.exec = exec;
}
/**
* {#inheritDoc}
*/
#Override
public Integer call() throws Exception {
if (n == 0 || n == 1) {
return n;
}
//Divide the problem
final Fib n1 = new Fib(n - 1, exec);
final Fib n2 = new Fib(n - 2, exec);
//FutureTask only allows run to complete once
final FutureTask<Integer> n2Task = new FutureTask<Integer>(n2);
//Ask the Executor for help
exec.execute(n2Task);
//Do half the work ourselves
final int partialResult = n1.call();
//Do the other half of the work if the Executor hasn't
n2Task.run();
//Return the combined result
return partialResult + n2Task.get();
}
}
Keep in mind that if you have divided the tasks up too much and the unit of work being done by each thread is too small, there will negative performance impacts. For example, the above code is a terribly slow way to solve Fibonacci.
I was trying examples from JCIP and Below program should not work but even if I execute it say 20 times it always work which means ready and number are becoming visible even if it should in this case
public class NoVisibility {
private static boolean ready;
private static int number;
private static class ReaderThread implements Runnable {
public void run() {
while (!ready)
Thread.yield();
System.out.println(number);
}
}
public static void main(String[] args) {
System.out.println(Runtime.getRuntime().availableProcessors());
//Number of Processor is 4 so 4+1 threads
new Thread(new ReaderThread()).start();
new Thread(new ReaderThread()).start();
new Thread(new ReaderThread()).start();
new Thread(new ReaderThread()).start();
new Thread(new ReaderThread()).start();
number = 42;
ready = true;
}
}
On my machine it always prints
4 -- Number of Processors
42
42
42
42
42
According to Listing 3.1 of JCIP It should sometimes print 0 or should never terminate it also suggest that there is no gaurantee that ready and number written by main thread will be visible to reader thread
Update
I added 1000ms sleep in main thread after strating all threads still same output.I know program is broken And I expect it behave that way
This program is broken since ready and number should be declared as volatile.
Due to the fact that ready and number are primitive variables, operations on them are atomic but it is not guaranteed that they will be visible by other threads.
It seems that the scheduler runs the threads after main and that is why they see the number and ready being initialized. But that is just one scheduling.
If you add e.g. a sleep in main so as to affect the scheduler you will see different results.
So the book is correct, there is no guarantee whether the Runnables running in separate threads will ever see the variable's being updated since the variables are not declared as volatile.
Update:
The problem here is that the due to the lack of volatile the compiler is free to read the field ready just once, and reuse the cached value in each execution of the loop.
The program is inherently flawed. And with threading issues the problem usually appears when you deploy your application to the field....
From JSL:
For example, in the following (broken) code fragment, assume that
this.done is a non-volatile boolean field:
while (!this.done)
Thread.sleep(1000);
The compiler is free to read the field this.done just once, and reuse
the cached value in each execution of the loop. This would mean that
the loop would never terminate, even if an other thread changed the
value of this.done.
What is important to keep in mind is that a broken concurrent program might always work with the right combination of JVM's options, machine architecture etc. That does not make it a good program, as it will probably fail in a different context: the fact that no concurrency issue shows up does not mean there aren't any.
In other words, you can't prove that a concurrent program is correct with testing.
Back to your specific example, I see the same behaviour as what you describe. But, if I remove the Thread.yield() instruction in the while loop, 3 out of 8 threads stop and print 42, but the rest don't and the program never ends.