If I have a class that extends Thread with static methods on it (this is very simplified):
public class MyThread extends Thread {
private static long SLEEP_INT = 30000;
private static Map<Integer, String> myData;
//every 30 seconds, update map of data
public void run() {
while(isActive) {
try {
populateDataFromDB();
Thread.sleep( SLEEP_INT );
}
catch( Exception e ) {
//do nothing
}
}
}
//static method to update map of data
public static void populateDataFromDB() {
//do stuff here, setting values in myData
}
}
and then somewhere else in my application I have:
MyThread.populateDataFromDB();
If I know that there is only one instance of the MyThread class in my application, is it still necessary to write synchronized code inside of populateDataFromDB in order to ensure thread safety?
You do need synchronization, because you will have more than one thread that is accessing the data held in MyThread.myData. You have shown us one thread, which is going to periodically read from your database and fill in your Map. You would only do this if you have something that is going to utilize this data.
You do not want the threads that use the Map to ever see a half-filled map, or a map that contains inconsistent state. To be safe, you would want to use synchronization to keep threads from reading myData while the MyThread thread is updating it every 30 seconds.
In other words, just because you only have one instance of a given class doesn't necessarily mean that you do not need synchronization. You need synchronization because you have multiple threads (each perhaps running different code) that are accessing the same data. You probably can allow all the readers of the data to access the data at the same time, but ensure exclusive access during the operation that writes to the data structure.
No, you never need synchronization when you have only one thread. (And main thread is the first thread in the application)
BUT When you do thread.start from your main thread, then you have 2 threads in your system. If for some reason your threads (new and main thread) are trying write a on memory which both threads have access to, then you want to serialize the access of threads on that shared memory. How to serialize the access is where synchronization helps.
So in your example if populateDataFromDB tries to modify that shared data and I assume that you may calling this from the new thread (inside run)and you also want to access that populateDataFromDB from main thread(I assumed that as you said "then somewhere else in my application I have:"), then you definitely needs synchronization.
Related
I wrote this program:
package com.example.threads;
import java.util.concurrent.ConcurrentHashMap;
public class ConcurrentHashMapBehaviour {
private static ConcurrentHashMap<String, String> chm = new ConcurrentHashMap<>();
private static Object _lock = new Object();
public static void main(String[] args) {
Thread t = new Thread(new MyThread());
t.start();
int counter = 0;
while (true) {
String val = "FirstVal" + counter;
counter++;
String currentVal = null;
synchronized (_lock) {
chm.put("first", val);
currentVal = chm.get("first");
}
System.out.println("In Main thread, current value is : " + currentVal);
}
}
static class MyThread implements Runnable {
#Override
public void run() {
String val = null;
while (true) {
synchronized (_lock) {
val = chm.get("first");
}
System.out.println("Value seen in MyThread is " + val);
}
}
}
}
I am sharing a common data between these thread viz: chm (ConcurrentHashMap). I made this to run in debug mode in which I made Main thread run more times than MyThread, both are controlled by _lock.
So, for instance, I made to run Main thread twice and so the value of "first" key would be "FirstVal1". Then i made Main Thread to halt and made MyThread to proceed, it was able to get the latest value, even though main thread was run multiple times.
How is this possible? I was under the impression that this variable needs to be volatile in order for these MyThread to get the latest values.
I didn't understand this behaviour. Can anyone decipher this where I am missing?
First, you're using a ConcurrentHashMap, which is safe to use in a multi-threaded environment, so if a thread puts a value into it, other threads will be able to see that value.
Second, you are synchronizing access to the map. That will ensure only one thread will write to the map.
Each such explicit synchronization also includes a memory-barrier, which will write any results waiting in a cache to be written to the main memory, making it possible for other threads to see it. Which is what a volatile variable access is: access to volatile values have memory visibility guarantees.
If you want to see data races in your program, remove all synchronization primitives and try again. That does not guarantee that you'll observe a race all the time, but you should be able to see unexpected values every now and then.
There are three misconceptions here:
Writing to a volatile variable guarantees that all changes made by the writing thread are published, i.e. can be seen by other threads. See The Java Language Specification Chapter 8 for all the details. This does not mean that the absence of the volatile modifier forbids publication. JVM implementations may be (and actually are) implemented much more forgiving. This is one of the reasons concurrency problems are so hard to trace.
"A hash table supporting full concurrency of retrievals and high expected concurrency for updates." is the first sentence of the API Documentation on the ConcurrentHashMap class. And that pretty much sums it up. The concurrent hashmap guarantees that when calling get any thread gets the latest value. That's exactly the purpose of this class. If you look at its source code you can by the way see that they use volatile fields internally.
You're additionally using synchronized blocks to access your data. These do not only guarantee exclusive access, they also guarantee that all changes made before leaving such a block are visible to all threads that synchronize on the same lock object.
To summarize it: By using the concurrent hashmap implementation and using synchronization blocks you publish the changes and make the latest changes visible to other threads. One of the two would have already been sufficient.
My questions are:
Does a Java program, by default, cause creation of only 1 thread?
If yes, and if we create a multi threaded program, when do multiple threads access the same code of a Java object?
For example I have a Java program with 2 methods - add() and sub(). In what scenario will 2 or more threads run the 'add()' method?
Isn't code always thread safe, as multiple threads will access different sections of code?
If not, please show an example program where thread safety is a concern.
Don't think of "sections of code", think of where the data lives and how many threads are accessing that actual data.
Local variables live on the stack of the thread they are being used in and are thread safe since they are different data "containers" per thread.
Any data that lives on the heap, like instance or static fields, are not inherently thread-safe because if more than one thread accesses that data then they might have contention.
We could get more complicated and talk about where the data really is but this basic explanation should give you a good idea of what's going on.
The below code gives an example of an instance that is shared by two threads, in this case both threads are accessing the same array list, which is pointing to the same array data containers in the heap. Run it a couple times and you'll eventually see a failure. If you comment out one of the threads it will work correctly every time, counting down from 99.
import java.util.ArrayList;
import java.util.List;
public class Main {
public static void main(String[] args) {
MyRunnable r = new MyRunnable();
new Thread(r).start();
new Thread(r).start();
}
public static class MyRunnable implements Runnable {
// imagine this list living out in the heap and both threads messing with it
// this is really just a reference, but the actual data is in the heap
private List<Integer> list = new ArrayList<>();
{ for (int i = 0; i < 100; i++) list.add(i); }
#Override public void run() {
while (list.size() > 0) System.out.println(list.remove(list.size() - 1));
}
}
}
1) Does a Java program, by default, cause creation of only 1 thread?
Really depends on what your code is doing. A simple System.out.println() call might probably just create one thread. But as soon as you for example raise a Swing GUI window, at least one other thread will be around (the "event dispatcher thread" that reacts to user input and takes care of UI updates).
2) If yes, and if we create a multi threaded program, when do multiple threads access the same code of a Java object?
Misconception on your end. Objects do not have code. Basically, a thread will run a specific method; either its own run() method, or some other method made available to it. And then the thread just executes that method, and any other method call that is triggered from that initial method.
And of course, while running that code, that thread might create other objects, or manipulate the status of already existing objects. When each thread only touches a different set of objects, then no problems arise. But as soon as more than one thread deals with the same object state, proper precaution is required (to avoid indeterministic behavior).
Your question suggests that you might not fully understand what "thread" means.
When we learned to program, they taught us that a computer program is a sequence of instructions, and they taught us that the computer executes those instructions one-by-one, starting from some well-defined entry point (e.g., the main() routine).
OK, but when we talk about multi-threaded programs, it no longer is sufficient to say that "the computer" executes our code. Now we say that threads execute our code. Each thread has its own idea of where it is in your program, and if two or more threads happen to be executing in the same function at the same time, then each of them has its own private copy of the function's arguments and local variables.
So, You asked:
Does a Java program, by default, cause creation of only 1 thread?
A Java program always starts with one thread executing your code, and usually several other threads executing JVM code. You don't normally need to be aware of the JVM threads. The one thread that executes your code starts its work at the beginning of your main() routine.
Programmers often call that initial thread the "main thread." Probably they call it that because it calls main(), but be careful! The name can be misleading: The JVM doesn't treat the "main thread" any differently from any other thread in a multi-threaded Java program.
if we create a multi threaded program, when do multiple threads access the same code of a Java object?
Threads only do what your program tells them to do. If you write code for two different threads to call the same function, then that's what they will do. But, let's break that question down a bit...
...First of all, how do we create a multi-threaded program?
A program becomes multi-threaded when your code tells it to become multi-threaded. In one simple case, it looks like this:
class MyRunnable implements Runnable {
public void run() {
DoSomeUsefulThing();
DoSomeOtherThing();
}
}
MyRunnable r = new MyRunnable();
Thread t = new Thread(r);
t.start();
...
Java creates a new thread when some other thread in your program calls t.start(). (NOTE! The Thread instance, t, is not the thread. It is only a handle that your program can use to start the thread and inquire about its thread's state and control it.)
When the new thread starts executing program instructions, it will start by calling r.run(). As you can see, the body of r.run() will cause the new thread to DoSomeUsefulThing() and then DoSomeOtherThing() before r.run() returns.
When r.run() returns, the thread is finished (a.k.a., "terminated", a.k.a., "dead").
So,
when do multiple threads access the same code of a Java object?
When your code makes them do it. Let's add a line to the example above:
...
Thread t = new Thread(r);
t.start();
DoSomeUsefulThing();
...
Note that the main thread did not stop after starting the new thread. It goes on to execute whatever came after the t.start() call. In this case, the next thing it does is to call DoSomeUsefulThing(). But that's the same as what the program told the new thread to do! If DoSomeUsefulThing() takes any significant time to complete, then both threads will be doing it at the same time... because that's what the program told them to do.
please show an example program where thread safety is a concern
I just did.
Think about what DoSomeUsefulThing() might be doing. If it's doing something useful, then it almost certainly is doing something to some data somewhere. But, I didn't tell it what data to operate on, so chances are, both threads are doing something to the same data at the same time.
That has a lot of potential to not turn out well.
One way to fix that is to tell the function what data to work on.
class MyDataClass { ... }
Class MyRunnable implements Runnable {
private MyDataClass data;
public MyRunnable(MyDataClass data) {
this.data = data;
}
public void run() {
DoSomeUsefulThingWITH(data);
DoSomeOtherThingWITH(data);
}
}
MyDataClass dat_a = new MyDataClass(...);
MyDataClass dat_b = new MyDataClass(...);
MyRunnable r = new MyRunnable(dat_a);
Thread t = new Thread(r);
t.start();
DoSomeUsefulThingWITH(dat_b);
There! Now the two threads are doing the same thing, but they are doing it to different data.
But what if you want them to operate on the same data?
That's a topic for a different question. Google for "mutual exclusion" to get started.
Depends on the implementation. Only one thread (the "main thread") will invoke the public static void main(String[]) method, but that doesn't mean other threads weren't started for other tasks.
A thread will access the "same code" if you program it to do so. I'm not sure what your idea of "section of code" is or where the idea that two threads will never access the same "section" at the same time comes from, but it's quite trivial to create thread-unsafe code.
import java.util.ArrayList;
import java.util.List;
public class Main {
public static void main(String[] args) throws InterruptedException {
List<Object> list = new ArrayList<>();
Runnable action = () -> {
while (true) {
list.add(new Object());
}
};
Thread thread1 = new Thread(action, "tread-1");
thread1.setDaemon(true); // don't keep JVM alive
Thread thread2 = new Thread(action, "thread-2");
thread2.setDaemon(true); // don't keep JVM alive
thread1.start();
thread2.start();
Thread.sleep(1_000L);
}
}
An ArrayList is not thread-safe. The above code has two threads constantly trying to add a new Object to the same ArrayList for approximately one second. It's not guaranteed, but if you run that code you might see an ArrayIndexOutOfBoundsException or something similar. Regardless of any exceptions being thrown, the state of the ArrayList is in danger of being corrupted. This is because state is updated by multiple threads with no synchronization.
So basically what would happen if you have the following:
class SyncTest {
private final static List<Object> mObjectList = new ArrayList<Object>();
public synchronized void mySyncMethod(Object object) {
new Thread(new Runnable() {
public void run() {
synchronized (SyncTest.this) {
for (int i = 0; i < mObjectList.size(); i++) {
//Do something with object
}
}
}
}).start();
}
}
Say, an activity needs to run in different threads which iterates over a collection. Hence why creating a thread in the method with different objects.
Is this the "right" way, or perhaps there is a better way?
Does this present any threats?
Re-entrancy doesn't apply here. The only impact of nesting here is allowing the inner class instances to have access to the enclosing instance (including the lock being used). The two things that are synchronized are called in different threads. The new thread once created will have to get chosen by the scheduler before it can run so even though these are using the same lock it would seem unlikely there would be much overlap between the two.
The thread that calls mySyncMethod acquires the lock on the instance of SyncTest it's using, then it creates a new Thread, starts it, then releases the lock and goes on its way.
Later once the new thread starts it has to acquire the lock on the SyncTest object that started it before it can execute its run method. If the lock on SyncTest is in use by something else (either the thread that just created it, another call to mySyncMethod on the same SyncTest instance, or another thread created by another call to mySyncMethod on the same SyncTest instance) then it would have to wait around to get the lock. Then it does whatever it needs to with the list, gets to the end of the method and releases the lock.
There are a lot of problems here:
It's unclear why you need to create your own thread rather than use a pool, or why the creating method needs to synchronize and wait around for the new thread to start before it can release its lock.
The lock on the SyncTest object is not encapsulated so other things could be acquiring it, it's unclear what things are contending for the lock.
Since the list is defined as a static class member, you have more than one SyncTest object; you're going to have separate threads messing with the same list, but using different locks, so it's hard to understand what the point of locking is.
But what you've shown isn't going to deadlock.
The outer synchronized ensures single thread access to the process of creating the new Thread while the inner synchronized ensures single thread access to the for loop.
Of course you realize that as written the code doesn't make much sense because the inner this reference is targeted at your anonymous inner class. I think you really mean SyncTest.this so you are synchronizing access to the SyncTest class. Even better would be to synchronize access to mObjectList.
As written with the inner class this fixed, your Thread would block until mySyncMethod returned.
Depending on what you are doing you might be better off using one of the Concurrent collection types rather than synchronizing access to your List as you'll get better concurrency.
Nothing would happen, synchronization is with respect to the thread, so even if you re-enter the sync block many times they would not block if the owner of the sync is the same Thread.
mySyncMethod() will only run when the thread calling it can gain ownership of the lock in that instance of SyncTest.
The run method could start in its different thread, but will block on the synchronize statement until THAT thread gains ownership of the lock for the same instance of SyncTest.
(answer assumed that 'this' referred to the outer class instance, which was corrected in the edit to the original post)
If you have synchronization block inside the method then the lock is on that block only. And block inside the method can have different object's lock.
I have the following Java code:
private Object guiUpdateLock = new Object();
public void updateLinkBar(SortedSet<Arbitrage> arbitrages) {
synchronized (guiUpdateLock) {
System.out.println("start");
for (Arbitrage arbitrage : arbitrages) {
//do some GUI stuff
}
System.out.println("end");
}
}
updateLinkBar() is called from many threads, and occasionally I get java.util.ConcurrentModificationException in "for" cycle.
But I can't understand why since I'm making a lock on object which obviously doesn't work because I can see two "start" in a row in the output.
Thank you in advance.
Locks must protect objects and not segments of code.
In your case you accept an arbitrary collection, acquire your private lock, and operate on the collection. Meanwhile the rest of your code may, in other threads, do whatever it wants with the collection and it doesn't have to contend for your private lock to do it.
You must significantly redesign your code such that all access to the collection in question is covered by the same lock.
Without your complete code I have to resort to guessing, but the most likely case is that the two threads are using different guiUpdateLog-Objects to synchronize on. My further guessing would be that they are using different instances of the class that contains the guiUpdateLock - and since it is not static there will be different Object-instances as well.
I understand the concept behind threading and have written threads in other languages, but I am having trouble understanding how to adapt them to my needs in java.
Basicly at present I have a vector of objects, which are read in from a file sequentially.
The file then has a list of events, which need to happen concurrently so waiting for one event to finish which takes 20-30 seconds is not an option.
There is only a couple of methods in the object which deal with these events. However from looking at tutorials, objects must extend/implement threads/runnable however if the object is in a thread making a method call to that object seems to happen sequentially anyway.
An y extra information would be appreciated as I am clearly missing something I am just not quite sure what!
So to summarise how can I execute a single method using a thread?
To start a thread you call start() on an instance of Thread or a subclass thereof. The start() method returns immediately. At the same time, the other thread (the one incarnated by the Thread instance) takes off, and proceeds with executing the run() method of the Thread instance.
Managing threads is not as easy as it seems. For a smoother API, try using an Executor (see the classes in java.util.concurrent).
The best thing to do in Java is create another class that takes in the data you need to process and performs whatever you need it to perform:
class Worker implements Runnable{
Object mydata;
Worker(Object data)
{
mydata = data;
}
#override
void run()
{
//process the data
System.out.println(data.toString());
//or if you want to use your class then:
YourClass yc = (YourClass)myData;
yc.methodB();
}
}
class YourClass
{
private final ExecutorService executor = Executors.newCachedThreadPool();
private ArrayList<Object> list;
YourClass()
{
list = new ArrayList<Object>();
list.add(new Object());
...
...
list.add(new Object());
}
void methodA()
{
for(Object item : list )
{
// Create a new thread with the worker class taking the data
executor.execute(new Worker(item));
}
}
void methodB(){/*do something else here*/}
}
Note that instead of getting the data, you can pass the actual class that you need the method to be invoked on:
executor.execute(new Worker(new MyClass()));
In the run method of the Worker class you invoke whatever you need to invoke on MyClass... the executor creates a new thread and calls run on your Worker. Each Worker will run in a separate thread and it will be parallel.
Thomas has already given the technical details. I am going to try and focus on the logic.
Here is what I can suggest from my understanding of your problem.
Lets say you have a collection of objects of type X (or maybe even a mix of different types). You need to call methods foo and/or bar in these objects based on some event specified. So now, you maybe have a second collection that stores those.
So we have two List objects (one for the X objects and other for the events).
Now, we have a function execute that will take X, and the event, and call foo or bar. This execute method can be wrapped in a thread, and executed simultaneously. Each of these threads can take one object from the list, increment the counter, and execute foo/bar. Once done, check the counter, and take the next one from the list. You can have 5 or more of these threads working on the list.
So, as we see, the objects coming from file do not have to be the Thread objects.
You have to be very careful that the List and counter are synchronized. Much better data structures are possible. I am sticking to a crude one for ease of understanding.
Hope this helps.
The key to threads is to remember that each task that must be running must be in its own thread. Tasks executing in the same thread will execute sequentially. Dividing the concurrent tasks among separate threads will allow you to do your required cocurrent processing.