I have a Java program which create 2 threads, inside these 2 threads, they are trying to update the global variable abc to different value, let's say integer 1 and integer 3.
Let's say they execute the code at the same time (at same milisecond), for example:
public class MyThread implements Runnable{
public void run(){
while(true){
if (currentTime == specificTime){
abc = 1; //another thread update abc to 3
}
}
}
}
In this case, how can we determine the result of the variable abc? I am very curious how Operating System schedule the execution?
(I know Synchronize should be used, but I just want to know naturally how the system will handle this kind of conflict problem.)
The operating system has little involvement in this: at the time your threads are running, the memory allocated to abc is under control of JVM running your program, so it's your program that is in control.
When two threads access the same memory location, the last writer wins. Which particular thread gets to be the last writer, however, is non-deterministic, unless you use synchronization.
Moreover, without you taking special care of accessing the shared data, one thread may not even see the results of the other thread writing to the abc location.
To avoid synchronization issues, you should use synchronization or one of the java.util.concurrent.atomic classes.
From Java's perspective the situation is fairly simple if abc is not volatile or accessed with appropriate synchronisation.
Let's assume that abc is 0 originally. After your two threads have updated it to respectively 1 and 3, abc could be observed in three states: 0, 1 or 3. Which value you get is not deterministic and the result may vary from one run to the other.
Depends on the operating system, running environment etc.
Some environments will actually stop you from doing this - known as thread safety.
Otherwise the results are totally unpredictable which is why it is so dangerous to do this.
It mainly just depends on which thread updated it last for what the value will be. One thread will get CPU cycles before the other to do the atomic operation first.
Also, I don't think that operating systems go as far as to schedule threads because in most operating systems it is the program that is responsible for them, and without explicit calls like synchronise, or a threading pool model then I think the order of execution is pretty hard to predict. Its a very environment dependent thing.
From the system's perspective the result will depend on many software, hardware and run-time factors that cannot be known in advance. From this perspective there is no conflict nor a problem.
From the programmer's perspective the result is not deterministic and therefore a problem/conflic. The conflict needs to be resolved at design-time.
In this case, how can we determine the result of the variable abc? I
am very curious how Operating System schedule the execution?
The result will not be deterministic, as the value will be the last written one. You can not make any guarantee about the result. The execution is scheduled like any other one. As you demand no synchronization in your code the JVM will not enforce anything for you.
I know Synchronize should be used, but I just want to know naturally
how the system will handle this kind of conflict problem.
Simple said: it wont, as for the system there is no conflict. Only for you, the programmer, problems will occur, since you will eventually run into a data race and not deterministic behavior. It is completely up to you.
just add volatile modificator to your variable, then it'll be udpated through all threads. And thread reading it will get it's actual value. volatile means that value will be always up to date for all threads accessing it.
Related
Take a look on that answer here (1):
https://stackoverflow.com/a/2964277/2182302 (Java Concurrency : Volatile vs final in "cascaded" variables?)
and on my old question here (2):
one java memoryFlushing volatile: A good programdesign?
So as i understand (see (2)) i can use volatile variables as memory barrier/flusher for ALL memory content not only for the referenced one by the volatile keyword.
now the accepted answer in (1) says that it would only flush the memory where the volatile-keyowrd is attached on.
So what is correct now?, and if the flushing-all principle in (2) is correct, why i cant then attach volatile to variables in combination with final?
Neither answer is correct, because you're thinking about it the wrong way. The concept of 'flush the memory' is simply made up. It's nowhere in the Java Virtual Machine Specification. It's just Not A Thing. Yes, many CPU/architectures do work that way, but the JVM does not.
You need to program to the JVM spec. Failure to do so means you write code that works perfectly fine on your machine, every time, and then you upload it to your server and it fails there. This is a horrible scenario: Buggy code, but bugs that cannot ever be trigged by tests. Yowza, those are bad.
So, what is in the JVM spec?
Not the concept of 'flushing'. What it does have, is the concept of HBHA: Happens-Before/Happens-After. Here's how it works:
There is a list of specific interactions which sets up that some line of code is defined to 'happen before' (HB/HA = Happens before/Happens after) some other line. An idea of this list is given below.
For any two lines which have an HBHA relationship, it would be impossible for the HA line to observe any state being such that it appears as if the HB line has not run yet. It's basically saying: HB lines occur before HA lines, except not quite that strong: You cannot observe the opposite (i.e. HB changes variable X, the HA line does not see this change to X, that'd be observing the opposite, that's impossible). Except timing-wise. In reality, HB/HA does not actually mean that lines get executed earlier or later: If you have 2 lines with an HB/HA relationship which have no effect on each other (one writes variable X. The other reads completely different variable Y), the JVM/CPU working together is free to reorder as much as it wants.
For any two lines with no defined HB/HA relationship, the JVM and CPU are free to do whatever it pleases. Including things that just cannot be explained with a simplistic 'flushing' model.
For example:
int a = 0, b = 0;
void thread1() {
a = 10;
b = 20;
}
void thread2() {
System.out.println(b);
System.out.println(a);
}
In the above, no HB/HA relationship has been established between thread 1 modifying the state of a/b, and thread 2 reading them.
Therefore, it is legal for a JVM to print 20 0, even though this cannot be explained with basic flushing notions: It is legal for the JVM to 'flush' b but not a.
It is somewhat unlikely for you to be capable of writing this code and actually observing that 20/0 print on any JVM version or any hardware, but the point is: It is allowed, and some day (or probably, it already exists), some exotic combo of JVM+hardware+OS version+state of the machine combines to actually make this happen, so if your code breaks if this sequence of events occurs, then you wrote a bug.
In effect, if one line mutates state, and another line reads it, and those 2 lines have no HB/HA, you messed up, and you need to fix your bug. Even (especially!) if you can't manage to write a test that actually proves it.
The trick here is that volatile reads do establish HB/HA, and as that is the only mechanism that the JVMS spec has to sync stuff up, yes, this has the effect of guaranteeing that you 'see all changes'. But this is not, at all, a good idea. Especially because the JVMS also says that the hotspot compiler is free to eliminate lines that have no side-effect.
So now we're going to have to get into a debate on whether 'establishes HBHA' is a side-effect. It probably is, but now we get to the rule of optimizations:
Write idiomatic code.
Whenever azul, the openjdk core dev team, etc are looking at improving the considerable optimization chops of the hotspot compiler, they look at real life code. It's like a gigantic pattern matcher: They look for patterns in code and finds ways to optimize them. They don't just write detectors for everything imaginable: They strongly prefer writing optimizers for patterns that commonly show up in real life java code. After all, what possible point is there spending time and effort optimizing a construction that almost no java code actually contains?
This gets us to the fundamental issue with using throw-away volatile reads as a way to establish HB/HA: Nobody does it that way, so the odds that at some point the JVMS is updated (or simply the conflicting rules are 'interpreted' as meaning: Yeah, hotspot can eliminate a pointless read, even if it did establish an HB/HA that is now no longer there) are quite high - you're also far more likely to run into JVM bugs if you do things in unique ways. After all, if you do things in ways that are well trodden, the bug would have been reported and fixed ages ago.
How to establish HB/HA:
The natural rule: Within a single thread, code cannot be observed to run in any way except sequentially, i.e. within one thread, all lines have HB/HA with each other in the obvious fashion.
synchronized blocks: If one thread exits a sync block and then another thread enters one on the same reference, then the sync-block-exit in A Happens-Before the sync-block-enter in B.
volatile reads and writes.
Some exotic stuff, such as: thread.start() happens-before the first line that thread's run() method, or all code in a thread is guaranteed to HB before thread.yield() on that thread finishes. These tend to be obvious.
Thus, to answer the question, is it good programming design?
No, it is not.
Establish HB/HA in the proper ways: Find something appropriate in java.util.concurrent and use it. From a simple lock to a queue to a fork/join pool for the entire job. Alternatively, stop sharing state. Alternatively, share state with mechanisms that are designed for concurrent access in more natural ways than HB/HA is, such as a database (transactions), or a message queue.
I am learning multithreading, and I have a little question.
When I am sharing some variable between threads (ArrayList, or something other like double, float), should it be lcoked by the same object in read/write? I mean, when 1 thread is setting variable value, can another read at same time withoud any problems? Or should it be locked by same object, and force thread to wait with reading, until its changed by another thread?
All access to shared state must be guarded by the same lock, both reads and writes. A read operation must wait for the write operation to release the lock.
As a special case, if all you would to inside your synchronized blocks amounts to exactly one read or write operation, then you may dispense with the synchronized block and mark the variable as volatile.
Short: It depends.
Longer:
There is many "correct answer" for each different scenarios. (and that makes programming fun)
Do the value to be read have to be "latest"?
Do the value to be written have let all reader known?
Should I take care any race-condition if two threads write?
Will there be any issue if old/previous value being read?
What is the correct behaviour?
Do it really need it to be correct ? (yes, sometime you don't care for good)
tl;dr
For example, not all threaded programming need "always correct"
sometime you tradeoff correctness with performance (e.g. log or progress counter)
sometime reading old value is just fine
sometime you need eventually correct (e.g. in map-reduce, nobody nor synchronized is right until all done)
in some cases, correct is mandatory for every moment (e.g. your bank account balance)
in write-once, read-only it doesn't matter.
sometime threads in groups with complex cases.
sometime many small, independent lock run faster, but sometime flat global lock is faster
and many many other possible cases
Here is my suggestion: If you are learning, you should thing "why should I need a lock?" and "why a lock can help in DIFFERENT cases?" (not just the given sample from textbook), "will if fail or what could happen if a lock is missing?"
If all threads are reading, you do not need to synchronize.
If one or more threads are reading and one or more are writing you will need to synchronize somehow. If the collection is small you can use synchronized. You can either add a synchronized block around the accesses to the collection, synchronized the methods that access the collection or use a concurrent threadsafe collection (for example, Vector).
If you have a large collection and you want to allow shared reading but exclusive writing you need to use a ReadWriteLock. See here for the JavaDoc and an exact description of what you want with examples:
ReentrantReadWriteLock
Note that this question is pretty common and there are plenty of similar examples on this site.
I have a set of counters which will only ever be updated in a single thread.
If I read these values from another thread and I don't user volatile/atomic/synchronized how out of date can these values be?
I ask as I am wondering if I can avoid using volatile/atomic/synchronized here.
I currently believe that I can't make any assumptions about time to update (so I am forced to use at least volatile). Just want to make sure I am not missing something here.
I ask as I am wondering if I can avoid using volatile/atomic/synchronized here.
In practice, the CPU cache is probably going to be synchronized to main memory anyway on a regular basis (how often depends on many parameters), so it sounds like you would be able to see some new values from time to time.
But that is missing the point: the actual problem is that if you don't use a proper synchronization pattern, the compiler is free to "optimise" your code and remove the update part.
For example:
class Broken {
boolean stop = false;
void broken() throws Exception {
while (!stop) {
Thread.sleep(100);
}
}
}
The compiler is authorised to rewrite that code as:
void broken() throws Exception {
while (true) {
Thread.sleep(100);
}
}
because there is no obligation to check if the non-volatile stop might change while you are executing the broken method. Mark the stop variable as volatile and that optimisation is not allowed any more.
Bottom line: if you need to share state you need synchronization.
How stale a value can get is left entirely to the discretion of the implementation -- the spec doesn't provide any guarantees. You will be writing code that depends on the implementation details of a particular JVM and which can be broken by changes to memory models or to how the JIT reorders code. The spec seems to be written with the intent of giving the implementers as much rope as they want, as long as they observe the constraints imposed by volatile, final, synchronized, etc.
It looks like the only way that I can avoid the synchronization of these variables is to do the following (similar to what Zan Lynx suggested in the comments):
Figure out the maximum age I am prepared to accept. I will make this
the "update interval".
Each "update interval" copy the unsynchronized counter variables to synchronized variables. This neeeds to be done on the write thread.
Read thread(s) can only read from these synchronized variables.
Of course, this optimization may only be a marginal improvement and would probably not be worth it considering the extra complexity it would create.
Java8 has a new class called LongAdder which helps with the problem of using volatile on a field. But until then...
If you do not use volatile on your counter then the results are unpredictable. If you do use volatile then there are performance problems since each write must guarantee cache/memory coherency. This is a huge performance problem when there are many threads writing frequently.
For statistics and counters that are not critical to the application, I give users the option of volatile/atomic or none with none the default. So far, most use none.
I'm wondering if there could be any problems while accessing one array with multiple threads but either only reading or only writing.
When the threads write to the array it wouldn't matter in which order they write and even if they write to the same entry all threads would write the same value.
For example, if I want to find prime numbers via the Sieve of Eratosthenes:
I create an array of consecutive numbers and set all multiples of prime numbers to 0 using multiple threads.
It wouldn't matter if the thread which strikes off the multiples of two and the thread which strikes off the multiples of 5 set the entry of the number 20 to 0 at the same time or one before or after the other.
So it's not an question of the qualitiy or consistency of the data, but of the technical possibility to do it wihout facing any java errors.
I'm assuming you mean 'without synchronization controls'. The short answer is no.
Synchronization is used for 2 reasons:
Mutual exclusion of data
communication between threads
Your setup indicates that the first reason isn't really a problem in your case. The algorithm effectively separates the data out so that multiple worker threads won't be using the same data.
However, in order for changes done in one thread to become visible to another thread, you must use synchronization. Without synchronization, the JVM makes no guarantee as to the ordering of writes. Updates that one thread makes may be visible in another thread at any time later, or even never. See Effective Java Item #66, and maybe look at the Java Concurrency in Practice book.
I don't think it would work since eventually you need to read the variables (to output them, save to disk, etc.). And the read has to be synchronized in order to guarantee correct interthread operation ordering. Remember that without synchronization java only guarantees intrathread operation ordering.
Now, you can say that you don't want to read them at all in anyway, but if that is the case, java can just optimize throwing away the whole code.
I have a Java program that runs many small simulations. It runs a genetic algorithm, where each fitness function is a simulation using parameters on each chromosome. Each one takes maybe 10 or so seconds if run by itself, and I want to run a pretty big population size (say 100?). I can't start the next round of simulations until the previous one has finished. I have access to a machine with a whack of processors in it and I'm wondering if I need to do anything to make the simulations run in parallel. I've never written anything explicitly for multicore processors before and I understand it's a daunting task.
So this is what I would like to know: To what extent and how well does the JVM parallel-ize? I have read that it creates low level threads, but how smart is it? How efficient is it? Would my program run faster if I made each simulation a thread? I know this is a huge topic, but could you point me towards some introductory literature concerning parallel processing and Java?
Thanks very much!
Update:
Ok, I've implemented an ExecutorService and made my small simulations implement Runnable and have run() methods. Instead of writing this:
Simulator sim = new Simulator(args);
sim.play();
return sim.getResults();
I write this in my constructor:
ExecutorService executor = Executors.newFixedThreadPool(32);
And then each time I want to add a new simulation to the pool, I run this:
RunnableSimulator rsim = new RunnableSimulator(args);
exectuor.exectue(rsim);
return rsim.getResults();
The RunnableSimulator::run() method calls the Simulator::play() method, neither have arguments.
I think I am getting thread interference, because now the simulations error out. By error out I mean that variables hold values that they really shouldn't. No code from within the simulation was changed, and before the simulation ran perfectly over many many different arguments. The sim works like this: each turn it's given a game-piece and loops through all the location on the game board. It checks to see if the location given is valid, and if so, commits the piece, and measures that board's goodness. Now, obviously invalid locations are being passed to the commit method, resulting in index out of bounds errors all over the place.
Each simulation is its own object right? Based on the code above? I can pass the exact same set of arguments to the RunnableSimulator and Simulator classes and the runnable version will throw exceptions. What do you think might cause this and what can I do to prevent it? Can I provide some code samples in a new question to help?
Java Concurrency Tutorial
If you're just spawning a bunch of stuff off to different threads, and it isn't going to be talking back and forth between different threads, it isn't too hard; just write each in a Runnable and pass them off to an ExecutorService.
You should skim the whole tutorial, but for this particular task, start here.
Basically, you do something like this:
ExecutorService executorService = Executors.newFixedThreadPool(n);
where n is the number of things you want running at once (usually the number of CPUs). Each of your tasks should be an object that implements Runnable, and you then execute it on your ExecutorService:
executorService.execute(new SimulationTask(parameters...));
Executors.newFixedThreadPool(n) will start up n threads, and execute will insert the tasks into a queue that feeds to those threads. When a task finishes, the thread it was running on is no longer busy, and the next task in the queue will start running on it. Execute won't block; it will just put the task into the queue and move on to the next one.
The thing to be careful of is that you really AREN'T sharing any mutable state between tasks. Your task classes shouldn't depend on anything mutable that will be shared among them (i.e. static data). There are ways to deal with shared mutable state (locking), but if you can avoid the problem entirely it will be a lot easier.
EDIT: Reading your edits to your question, it looks like you really want something a little different. Instead of implementing Runnable, implement Callable. Your call() method should be pretty much the same as your current run(), except it should return getResults();. Then, submit() it to your ExecutorService. You will get a Future in return, which you can use to test if the simulation is done, and, when it is, get your results.
You can also see the new fork join framework by Doug Lea. One of the best book on the subject is certainly Java Concurrency in Practice. I would strong recommend you to take a look at the fork join model.
Java threads are just too heavyweight. We have implement parallel branches in Ateji PX as very lightweight scheduled objects. As in Erlang, you can create tens of millions of parallel branches before you start noticing an overhead. But it's still Java, so you don't need to switch to a different language.
If you are doing full-out processing all the time in your threads, you won't benefit from having more threads than processors. If your threads occasionally wait on each other or on the system, then Java scales well up to thousands of threads.
I wrote an app that discovered a class B network (65,000) in a few minutes by pinging each node, and each ping had retries with an increasing delay. When I put each ping on a separate thread (this was before NIO, I could probably improve it now), I could run to about 4000 threads in windows before things started getting flaky. Linux the number was nearer 1000 (Never figured out why).
No matter what language or toolkit you use, if your data interacts, you will have to pay some attention to those areas where it does. Java uses a Synchronized keyword to prevent two threads from accessing a section at the same time. If you write your Java in a more functional manner (making all your members final) you can run without synchronization, but it can be--well let's just say solving problems takes a different approach that way.
Java has other tools to manage units of independent work, look in the "Concurrent" package for more information.
Java is pretty good at parallel processing, but there are two caveats:
Java threads are relatively heavyweight (compared with e.g. Erlang), so don't start creating them in the hundreds or thousands. Each thread gets its own stack memory (default: 256KB) and you could run out of memory, among other things.
If you run on a very powerful machine (especially with a lot of CPUs and a large amount of RAM), then the VM's default settings (especially concerning GC) may result in suboptimal performance and you may have to spend some times tuning them via command line options. Unfortunately, this is not a simple task and requires a lot of knowledge.