I have an integer in my Statistics.java class like this
int completionAmount = 0;
When the program startups I run a sql query in the main() method and set the completionAmount int based on what it returns (such as 10000).
After that's done I start a new single threaded executor that runs every 1 second and it uses that completionAmount integer to keep counting, do I need to synchornize the access to that integer now every time I use it in the new thread, because I loaded it from the main thread on startup, or since the main thread will no longer be using the integer, I won't have to make any changes?
Assuming you are doing something like this, in the same thread:
completionAmount = /* initialize from query; never set again */
new Thread(/* thing that uses completionAmount */).start();
then no, no additional synchronization is needed.
According to JLS 17.4.5:
A call to start() on a thread happens-before any actions in the started thread.
So, you've got a happens-before relationship between the write to completionAmount and its read.
Actually, you mention a single-threaded executor in the question. I'm assuming this is specifically an ExecutorService. From the Javadoc:
Actions in a thread prior to the submission of a Runnable or Callable task to an ExecutorService happen-before any actions taken by that task, which in turn happen-before the result is retrieved via Future.get().
So, provided the assignment happens before you submit the task to the executor, you've got a happens-before relationship, so no additional synchronization is required.
If "a single-threaded executor" you mention is a single-threaded ExecutorService,then synchronization is not needed.
Beacause single-threaded ExecutorService can ensure only one thread can access completionAmount integer on the same time,and the rest of your code must keep that.
Integer is not thread safe,you can use AtomicInteger instead of Integer to ensure thread safe.
Related
As per the below JLS rule-
Each action in a thread happens-before every action in that thread that comes later in the program's order.
In the below case, would clear() always execute before put in a multithreaded environment
private ConcurrentMap<Feature, Boolean> featureMap = new ConcurrentHashMap<>();
public void loadAllConfiguration() {
featureMap.clear();
featureMap.put()
}
In the below case, would clear() always execute before put in a multithreaded environment
Yes, in a multithreaded application, in each thread clear() comes before put(). However when you look at the interaction of multiple threads then this is no longer true in terms of the shared ConcurrentHashMap.
For example, due to race conditions, you might see the following sequence of events:
Thread 1 calls clear().
Thread 2 calls clear().
Thread 3 calls clear().
Thread 2 calls put().
Thread 1 calls put().
Thread 3 calls put().
Even though each thread does clear and then put, there is no guarantee that there will only be 1 item in the ConcurrentHashMap if that was the point of your question.
I'm not super clear on the question but I think:
Each action in a thread happens-before every action in that thread
that comes later in the program's order.
Means that within the context of a single thread (since clear and put are blocking synchronous calls) that the runtime guarantees that they will be executed in the order they are invoked.
Based on my limited understanding of java, this should NOT extend to a multithreaded environment. Suppose you have a single concurrent map shared between two threads, and each one of those threads invokes loadAllConfiguration against a shared featureMap.
The threads can be executed concurrently, so that the operations are interleved!!!!
This could result in an execution order of:
**THREAD 1** **THREAD 2**
map.clear()
map.put()
map.clear()
map.put()
or even in both clears being called concurrently and then both puts being applied concurrently.
I haven't used java so i'm not sure what the ConcurrentHashMap provides, but i'm assuming that it only protects you from race conditions (one thread writing while another reads) by using some sort of synchronization, but it should still might leave you exposed to logical errors (ie clears/put) being interleaved in a deterministic way)
Is there a better way to make writing to files thread safe (for cases where the file may not be all the same in every thread) than synchronizing the method or the file writer? I read a few threads similar to this topic, but they seem to focus on one file as opposed to multiple files.
Ex. There are 20 threads that writes (meaning it uses a method that creates a a file writer to the file and then writes to it with a try-catch, etc) to file; 10 of the threads write to fileA, 5 threads write to fileB, 4 threads write to fileC, and 1 thread writes to fileD.
Synchronizing the method would not be efficient since threads that want to write to different files will have to wait for the previous thread to finish before it can proceed. I think synchronizing the file writer does pretty much the same or am I wrong?
If I were to have a separate thread thread (from the main application) that writes to a file, would they execute (run) in the order they were submitted to the ExecutorService with 1 thread?
In the main application, I would submit new threads to the ExecutorService (uses 1 thread). The threads would write to a file (using a write method that has the FileWriter synchronized from a Logger class). The threads would write to the file one by one because the FileWriter is syncrhonized and there is only 1 threads for the ExecutorService, which will prevent multiple writes to the same file at once. The question is will the threads write to the file in the order they were submitted to the ExecutorService? I know they start in the order they were submitted, but I am not too sure on the execution order.
You are mixing some things up which creates the confusion: First, ExecutorService is an interface that does not mandate a particular way how the submitted tasks (not threads) are executed. So it doesn’t make sense to ask how an ExecutorService will do a particular thing as it is not specified. It might even drop all tasks without executing anything.
Second, as already mentioned above, you are submitting tasks, not threads, to an ExecutorService whereas the tasks may implement Runnable or Callable.
Unfortunately there’s a design flaw in Java that Thread implements Runnable so you actually can pass a Thread instance to submit() which you should never do as it creates a lot of confusion for no benefit. When you do so, the common ExecutorService implementations will treat it as an ordinary Runnable invoking its run() method ignoring the fact completely that it is a Thread instance. The thread resource associated with that Thread instance will have no relationship with the thread actually executing the run() method (if the implementation ever calls run()).
So if you submit tasks implemented as Runnable or Callable to an ExecutorService you have to study the documentation of the particular implementation to learn about how they will be executed.
E.g. if you use Executors.newSingleThreadExecutor() to get an implementation, its documentation says:
Creates an Executor that uses a single worker thread operating off an unbounded queue. (Note however that if this single thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks.) Tasks are guaranteed to execute sequentially, and no more than one task will be active at any given time. Unlike the otherwise equivalent newFixedThreadPool(1) the returned executor is guaranteed not to be reconfigurable to use additional threads.
(emphasis by me)
So that would answer your question completely. Note that in this case you don’t even need synchronized within your task’s implementation as this ExecutorService already provides the mutual exclusion guaranty required for your tasks.
Consider the alternative of having a specialized file writer thread that is the only thread to write to the files. The other threads can safely add messages to a java.util.concurrent.BlockingQueue. As soon as a thread has placed a message on the queue, it can get back to work.
Say, I have a data object:
class ValueRef { double value; }
Where each data object is stored in a master collection:
Collection<ValueRef> masterList = ...;
I also have a collection of jobs, where each job has a local collection of data objects (where each data object also appears in the masterList):
class Job implements Runnable {
Collection<ValueRef> neededValues = ...;
void run() {
double sum = 0;
for (ValueRef x: neededValues) sum += x;
System.out.println(sum);
}
}
Use-case:
for (ValueRef x: masterList) { x.value = Math.random(); }
Populate a job queue with some jobs.
Wake up a thread pool
Wait until each job has been evaluated
Note: During the job evaluation, all of the values are all constant. The threads however, have possibly evaluated jobs in the past, and retain cached values.
Question: what is the minimal amount of synchronization necessary to ensure each thread sees the latest values?
I understand synchronize from the monitor/lock-perspective, I do not understand synchronize from the cache/flush-perspective (ie. what is being guaranteed by the memory model on enter/exit of the synchronized block).
To me, it feels like I should need to synchronize once in the thread that updates the values to commit the new values to main memory, and once per worker thread, to flush the cache so the new values are read. But I'm unsure how best to do this.
My approach: create a global monitor: static Object guard = new Object(); Then, synchronize on guard, while updating the master list. Then finally, before starting the thread pool, once for each thread in the pool, synchronize on guard in an empty block.
Does that really cause a full flush of any value read by that thread? Or just values touched inside the synchronize block? In which case, instead of an empty block, maybe I should read each value once in a loop?
Thanks for your time.
Edit: I think my question boils down to, once I exit a synchronized block, does every first read (after that point) go to main memory? Regardless of what I synchronized upon?
It doesn't matter that threads of a thread pool have evaluated some jobs in the past.
Javadoc of Executor says:
Memory consistency effects: Actions in a thread prior to submitting a Runnable object to an Executor happen-before its execution begins, perhaps in another thread.
So, as long as you use standard thread pool implementation and change the data before submitting the jobs you shouldn't worry about memory visibility effects.
What you are planning sounds sufficient. It depends on how you plan to "wake up thread pool."
The Java Memory Model provides that all writes performed by a thread before entering a synchronized block are visible to threads that subsequently synchronize on that lock.
So, if you are sure the worker threads are blocked in a wait() call (which must be inside a synchronized block) during the time you update the master list, when they wake up and become runnable, the modifications made by the master thread will be visible to these threads.
I would encourage you, however, to apply the higher level concurrency utilities in the java.util.concurrent package. These will be more robust than your own solution, and are a good place to learn concurrency before delving deeper.
Just to clarify: It's almost impossible to control worker threads without using a synchronized block where a check is made to see whether the worker has a task to implement. Thus, any changes made by the controller thread to the job happen-before the worker thread awakes. You require a synchronized block, or at least a volatile variable to act as a memory barrier; however, I can't think how you'd create a thread pool with using one of these.
As an example of the advantages of using the java.util.concurrency package, consider this: you could use a synchronized block with a wait() call in it, or a busy-wait loop with a volatile variable. Because of the overhead of context switching between threads, a busy wait can actually perform better under certain conditions—it's not necessary the horrible idea that one might assume at first glance.
If you use the Concurrency utilities (in this case, probably an ExecutorService), the best selection for your particular case can be made for you, factoring in the environment, the nature of the task, and the needs of other threads at a given time. Achieving that level of optimization yourself is a lot of needless work.
Why don't you make Collection<ValueRef> and ValueRef immutable or at least don't modify the values in the collection after you have published the reference to the collection. Then you will not have any worry about synchronization.
That is when you want to change the values of the collection, create a new collection and put new values in it. Once the values have been set pass the collection reference new job objects.
The only reason not to do this would be if the size of the collection is so large that it barely fits in memory and you cannot afford to have two copies, or the swapping of the collections would cause too much work for the garbage collector (prove that one of these is a problem before you use a mutable data structure for threaded code).
I have 2 threads T1 and T2 ,both have different jobs so usually we prefer to accomplish this task by thread Joins.
But we can do this with out using join(). We can add T2 thread's code inside T1 thread.
What difference does this make ?
Joining a thread means that one waits for the other to end, so that you can safely access its result or continue after both have finished their jobs.
Example: if you start a new thread in the main thread and both do some work, you'd join the main thread on the newly created one, causing the main thread to wait for the second thread to finish. Thus you can do some work in parallel until you reach the join.
If you split a job into two parts which are executed by different threads you may get a performance improvement, if
the threads can run independently, i.e. if they don't rely on each other's data, otherwise you'd have to synchronize which costs performance
the JVM is able to execute multiple threads in parallel, i.e. you have a hyperthreading/multicore machine and the JVM utilizes that
usually we prefer to accomplish this task by thread Joins.
No we don't. We accomplish this task by starting two threads. There is no obligation to use join() so there is no 'should' about it. If you want to pause the current thread while another thread completes, do so. If you don't, don't.
If you call T1.join(); from T2 it will wait for T1 to die (finish). It is a form of thread synchronization, but from what you describe you can simply fire of two thread and simply do not use join. If you use two threads then the work will be done in parallel, if you put the code only in one thread then the work will be done sequentially.
Here is the reason to use join: You use it when final result depends on result of two tasks which could run at the same time.
Example1:
After user clicks submit button the program has to call two external webservices to update their respective systems. It can be done at the same time that is why we would create a separate thread for one of webservices.
The user will sit before the screen and wait for a notification: Your submission is OK! The screen should say OK only after both threads finished.
Two things.
Join is used only when one thread must wait for the open to finish (lets say thread A prepares a file and thread B cannot continue until the file is ready). There are instance where threads are independent and no join is needed (for example most of daemon threads).
With threading you get several things:
- mainly, independence in the order of execution. Lets say that you have a program that when you push a button does some heavy processing. If you do that processing in the main thread, you GUI will freeze until the task is finished. If you do the processing in another thread, then the GUI thread is "freed" and the GUI keeps working.
- in some (most) of modern computers, creating several threads could allow the OS to use the different cores to serve different threads, improving performance.
The drawback is bigger complexity, as you need information of other threads execution state.
You could use something like a java.util.concurrent.CountDownLatch, eg:
CountDownLatch doneSignal = new CountDownLatch(2);
and have each thread countDown() when they're done, so a main thread knows when both threads have completed.
using Join also like we can add the T2 thread's code inside T1 thread
join() like the method name implies waits for the thread to die and joins it at the end of execution. You can add one thread's code inside another but that would destroy the purpose of using 2 separate threads to run your jobs concurrently. Placing one code after the other would run your statements in sequence. There is no concurrency.
When in doubt, consult the javadocs - http://download.oracle.com/javase/1.5.0/docs/api/java/lang/Thread.html#join%28%29
If T1 and T2 do different tasks which do not depend on state changes caused by each other - you should not join them to reap advantages of parallel execution. In case there are state dependenices you should synchronize both threads using mechanisms like wait/notify or even .Join() depending on your use case.
And as for, combining the run() methods of both threads, it's entirely left to you. I mean, you should understand why both threads are of different "types" (as they have different run() body) in the first place . It's a design aspect and not a performance aspect.
All the parallel threads typically needs to join at some point in the code thereby allowing them to wait until all threads terminate. After this point typically the serial processing continues. This ensures proper synchronisation between peer threads so that subsequent serial code does not begin abruptly before all parallel threads complete the collective activity.
The main difference is when we join T2 thread with T1 ,the time T2 is executing the job can be utilised by T1 also ,that means they will do different job parllely.But this cann't happen when you include the T2 thread code inside T1 thread.
I'm looking for the simplest, most straightforward way to implement the following:
main starts and launches 3 threads
all 3 tasks process and end in a resulting value (which I need to return somehow?)
main waits (.join?) on each thread to ensure they have all 3 completed their task
main somehow gets the value from each thread (3 values)
Then the rest is fairly simple, processes the 3 results and then terminates...
Now, I've been doing some reading and found multiple ideas, like:
Using Future, but this is for asynch, is this really a good idea when the main thread needs to block waiting for all 3 spawned threads to finsih?
Passing in an object (to a thread) and then simply having the thread "fill it" with the result
Somehow using Runnable (not sure how yet).
Anyways - what would be the best, and simplest recommended approach?
Thanks,
List<Callable<Result>> list = ... create list of callables
ExecutorService es = Executors.newFixedThreadPool(3);
List<Future<Result>> results = es.invokeAll(list);
ExecutorService.invokeAll method will return only after all tasks (instances of Callable) finished, either normally or by throwing exception.
For details see ExecutorService (mainly its invokeAll method), Executors, Callable.
You could also use a Semaphore from java.util.concurrent.
Create a new Semaphore with 1 - #threads permits and have main call acquire() on the Semaphore.
When each of the threads you have created has finished it's work, get it to call the release() method.
As you have created a Semaphore with a negative number of permits the call to acquire() will block until this number becomes positive. This will not happen until all of your threads have released a permit on the Semaphore.