Need advice on synchronization of Java Vector / ConcurrentModificationException - java

In a legacy application I have a Vector that keeps a chronological list of files to process and multiple threads ask it for the next file to process. (Note that I realize that there are likely better collections to use (feel free to suggest), but I don't have time for a change of that magnitude right now.)
At a scheduled interval, another thread checks the working directory to see if any files appear to have been orphaned because something went wrong. The method called by this thread occasionally throws a ConcurrentModificationException if the system is abnormally busy. So I know that at least two threads are trying to use the Vector at once.
Here is the code. I believe the issue is the use of the clone() on the returned Vector.
private synchronized boolean isFileInDataStore( File fileToCheck ){
boolean inFile = false;
for( File wf : (Vector<File>)m_dataStore.getFileList().clone() ){
File zipName = new File( Tools.replaceFileExtension(fileToCheck.getAbsolutePath(), ZIP_EXTENSION) );
if(wf.getAbsolutePath().equals(zipName.getAbsolutePath()) ){
inFile = true;
break;
}
}
return inFile;
}
The getFileList() method is as follows:
public synchronized Vector<File> getFileList() {
synchronized(fileList){
return fileList;
}
}
As a quick fix, would changing the getFileList method to return a copy of the vector as follows suffice?
public synchronized Vector<File> getFileListCopy() {
synchronized(fileList){
return (Vector<File>)fileList.clone();
}
}
I must admit that I am generally confused by the use of synchronized in Java as it pertains to collections, as simply declaring the method as such is not enough. As a bonus question, is declaring the method as synchronized and wrapping the return call with another synchronized block just crazy coding? Looks redundant.
EDIT: Here are the other methods which touch the list.
public synchronized boolean addFile(File aFile) {
boolean added = false;
synchronized(fileList){
if( !fileList.contains(aFile) ){
added = fileList.add(aFile);
}
}
notifyAll();
return added;
}
public synchronized void removeFile( File dirToImport, File aFile ) {
if(aFile!=null){
synchronized(fileList){
fileList.remove(aFile);
}
// Create a dummy list so I can synchronize it.
List<File> zipFiles = new ArrayList<File>();
synchronized(zipFiles){
// Populate with actual list
zipFiles = (List<File>)diodeTable.get(dirToImport);
if(zipFiles!=null){
zipFiles.remove(aFile);
// Repopulate list if the number falls below the number of importer threads.
if( zipFiles.size()<importerThreadCount ){
diodeTable.put(dirToImport, getFileList( dirToImport ));
}
}
}
notifyAll();
}
}

Basically, there are two separate issues here: sycnhronization and ConcurrentModificationException. Vector in contrast to e.g. ArrayList is synchronized internally so basic operation like add() or get() do not need synchronization. But you can get ConcurrentModificationException even from a single thread if you are iterating over a Vector and modify it in the meantime, e.g. by inserting an element. So, if you performed a modifying operation inside your for loop, you could break the Vector even with a single thread. Now, if you return your Vector outside of your class, you don't prevent anyone from modifyuing it without proper synchronization in their code. Synchronization on fileList in the original version of getFileList() is pointless. Returning a copy instead of original could help, as could using a collection which allows modification while iterating, like CopyOnWriteArrayList (but do note the additional cost of modifications, it may be a showstopper in some cases).

"I am generally confused by the use of synchronized in Java as it
pertains to collections, as simply declaring the method as such is not
enough"
Correct. synchronized on a method means that only one thread at a time may enter the method. But if the same collection is visible from multiple methods, then this doesn't help much.
To prevent two threads accessing the same collection at the same time, they need to synchronize on the same object - e.g. the collection itself. You have done this in some of your methods, but isFileInDataStore appears to access a collection returned by getFileList without synchronizing on it.
Note that obtaining the collection in a synchronized manner, as you have done in getFileList, isn't enough - it's the accessing that needs synchronizing. Cloning the collection would (probably) fix the issue if you only need read-access.
As well as looking at synchronizing, I suggest you track down which threads are involved - e.g. print out the call stack of the exception and/or use a debugger. It's better to really understand what's going on than to just synchronize and clone until the errors go away!

Where does the m_dataStore get updated? That's a likely culprit if it's not synchronized.

First, you should move your logic to whatever class is m_dataStore if you haven't.
Once you've done that, make your list final, and synchronize on it ONLY if you are modifying its elements. Threads that only need to read it, don't need synchronized access. They may end up polling an outdated list, but I suppose that is not a problem. This gets you increased performance.
As far as I can tell, you would only need to synchronize when adding and removing, and only need to lock your list.
e.g.
package answer;
import java.util.logging.Level;
import java.util.logging.Logger;
public class Example {
public static void main(String[] args)
{
Example c = new Example();
c.runit();
}
public void runit()
{
Thread.currentThread().setName("Thread-1");
new Thread("Thread-2")
{
#Override
public void run() {
test1(true);
}
}.start();
// Force a scenario where Thread-1 allows Thread-2 to acquire the lock
try {
Thread.sleep(1000);
} catch (InterruptedException ex) {
Logger.getLogger(Example.class.getName()).log(Level.SEVERE, null, ex);
}
// At this point, Thread-2 has acquired the lock, but it has entered its wait() method, releasing the lock
test1(false);
}
public synchronized void test1(boolean wait)
{
System.out.println( Thread.currentThread().getName() + " : Starting...");
try {
if (wait)
{
// Apparently the current thread is supposed to wait for some other thread to do something...
wait();
} else {
// The current thread is supposed to keep running with the lock
doSomeWorkThatRequiresALockLikeRemoveOrAdd();
System.out.println( Thread.currentThread().getName() + " : Our work is done. About to wake up the other thread(s) in 2s...");
Thread.sleep(2000);
// Tell Thread-2 that it we have done our work and that they don't have to spare the CPU anymore.
// This essentially tells it "hey don't wait anymore, start checking if you can get the lock"
// Try commenting this line and you will see that Thread-2 never wakes up...
notifyAll();
// This should show you that Thread-1 will still have the lock at this point (even after calling notifyAll).
//Thread-2 will not print "after wait/notify" for as long as Thread-1 is running this method. The lock is still owned by Thread-1.
Thread.sleep(1000);
}
System.out.println( Thread.currentThread().getName() + " : after wait/notify");
} catch (InterruptedException ex) {
Logger.getLogger(Example.class.getName()).log(Level.SEVERE, null, ex);
}
}
private void doSomeWorkThatRequiresALockLikeRemoveOrAdd()
{
// Do some work that requires a lock like remove or add
}
}

Related

Threading in Java (practicing for college)

Create a program that simulates training at an athletic stadium,
there is one track in the stadium that can be used by up to 5 people at a time
and the coach does not allow that number to exceed, but when some of the athletes finish their run (2sec)
and free up space then notify other athlete for running.
After 2 seconds, all processes are frozen
My question is, could anyone explain to me why something like this does not work and how to handle this problem?
class JoggingTrack {
public int numOfAthlete;
public JoggingTrack() {
this.numOfAthlete = 0;
}
#Override
public String toString() {
return "\nNumber of Athlete: " + numOfAthlete + "\n";
}
}
class Athlete extends Thread {
private JoggingTrack track;
private boolean running;
public Athlete(JoggingTrack s) {
this.track = s;
this.running = false;
}
public synchronized boolean thereIsSpace() {
if(track.numOfAthlete < 5) {
return true;
}
return false;
}
public synchronized void addAthlete() {
track.numOfAthlete++;
this.running = true;
}
public synchronized void removeAthlete() {
track.numOfAthlete--;
this.running = false;
}
#Override
public void run() {
try {
while(true) {
while(!this.thereIsSpace()) {
wait();
}
while(!this.running) {
addAthlete();
sleep(2000);
}
while(this.running) {
removeAthlete();
notify();
}
}
} catch (Exception e) {
}
}
}
public class Program {
static JoggingTrack track;
static Athlete[] a;
public static void main(String[] args) {
track = new JoggingTrack();
a = new Athlete[10];
for(int i = 0; i < 10; i++) {
a[i] = new Athlete(track);
a[i].start();
}
while(true) {
try {
System.out.println(track);
Thread.sleep(500);
} catch (Exception e) {
}
}
}
}
A lot of issues with this.
Your methods are in the wrong place. The synchronized keyword synchronizes on an instance of the class, not across multiple instances. So your remove and add functions on different athletes would cause race conditions. These functions should be moved to the Track object, because all athletes are using the same track (so should your isThereSpace function). At the same time, you should not be directly accessing the member variables of Track in Athlete, use a getter for it instead.
Secondly, you use of wait and notify are wrong. They leave lots of holes for race conditions, although it may work most of the time. And this isn't really a good place for using them- a counting semaphore in the Track class would be a better solution- its exactly what counting semaphores are made for. Look at the Semaphore class for more details. Its basically a lock that will allow N owners of the lock at a time, and block additional requesters until an owner releases it.
Your threads are waiting forever, because they are waiting on some object (their instance itself), and nobody ever notify-es them, using the right instance.
One way to fix this is to have all athlete-s to synchronize/wait/notify on the same object, in example, the JoggingTrack. So that an athlete will wait on the track with track.wait(), and when an athlete is done running, it will call track.notify() , and then a waiting athlete will be waken up.
Then there are other issues as noted by Gabe-
Once you fix the first issue, you will find the race conditions- eg. too many threads all start running even though there are some checks (thereIsSpace) in place.
My question is, could anyone explain to me why something like this does not work and how to handle this problem?
Debugging multithreaded programs is hard. A thread-dump might help and println-debugging might also be helpful however they can cause the problem to migrate so it should be used with caution.
In your case, you are confusing your objects. Think about
Athlete.thereIsSpace() and Athlete.addAthlete(...). Does that make any sense? Does an athlete have space? Do you add an athlete to an athlete? Sometimes the object names don't help you make these sorts of evaluations but in this case, they do. It is the JoggingTrack that has space and that an athlete is added to.
When you are dealing with multiple threads, you need to worry about data sharing. If one thread does track.numOfAthlete++;, how will other threads see the update? They aren't sharing memory by default. Also ++ is actually 3 operations (read, increment, write) and you need to worry about multiple threads running the ++ at the same moment. You will need to use a synchronized block to ensure memory updates or use other concurrent classes such as AtomicInteger or a Semaphore which take care of the locking and data-sharing for you. Also, more generally, you really should not modify another object's fields in this way.
Lastly, you are confused about how wait/notify work. First of all, they only work if they are inside a synchronized block or method so I think the code you've posted won't compile. In your case, the thing that the multiple Athletes are contending for is the JoggingTrack, so the track needs to have the synchronized keyword and not the Athlete. The Athlete is waiting for the JoggingTrack to get space. No one is waiting for the athlete. Something like:
public class JoggingTrack {
public synchronized boolean thereIsSpace() {
return (numOfAthletes < 5);
}
public synchronized void addAthlete() {
numOfAthletes++;
}
...
Also, like the ++ case, you need to be really careful about race conditions in your code. No, not jogging races but programming races. For example, what happens if 2 athletes both go to do the following logic at precisely the same time:
while (!track.thereIsSpace()) {
track.wait();
}
addAthlete();
Both athletes might call thereIsSpace() which returns true (because no one has been added yet). Then both go ahead and add themselves to the track. That would increase the number of athletes by 2 and maybe exceed the 5 limit. These sorts of races-conditions happen every time unless you are in a synchronized block.
The JoggingTrack could instead have code like:
public synchronized void addIfSpaceOrWait() {
while (numOfAthletes >= 5) {
wait();
}
numOfAthletes++;
}
Then the althetes would do:
track.addIfSpaceOrWait();
addAthlete();
This code has no race condition because only one athlete will get the synchronized lock on the track at one time -- java guarantees it. Both of them can call that at the same time and one will return and the other will wait.
Couple other random comments:
You should never do a catch (Exception e) {}. Just doing an e.printStackStrace() is bad enough but not seeing your errors is really going to confuse you ability to debug your program. I will hope you just did that for your post. :-)
I love the JoggingTrack object name but whenever you reference it, it should be joggingTrack or maybe track. Be careful of JoggingTrack s.
An Athlete should not extend thread. It isn't a thread. It should implement Runnable. This is a FAQ.

Java Thread seemingly skipping conditional statement [duplicate]

This question already has answers here:
Why doesnt this Java loop in a thread work?
(4 answers)
Closed 3 years ago.
For a recent library I'm writing, I wrote a thread which loops indefinitely. In this loop, I start with a conditional statement checking a property on the threaded object. However it seems that whatever initial value the property has, will be what it returns even after being updated.
Unless I do some kind of interruption such as Thread.sleep or a print statement.
I'm not really sure how to ask the question unfortunately. Otherwise I would be looking in the Java documentation. I have boiled down the code to a minimal example that explains the problem in simple terms.
public class App {
public static void main(String[] args) {
App app = new App();
}
class Test implements Runnable {
public boolean flag = false;
public void run() {
while(true) {
// try {
// Thread.sleep(1);
// } catch (InterruptedException e) {}
if (this.flag) {
System.out.println("True");
}
}
}
}
public App() {
Test t = new Test();
Thread thread = new Thread(t);
System.out.println("Starting thread");
thread.start();
try {
Thread.sleep(1000);
} catch (InterruptedException e) {}
t.flag = true;
System.out.println("New flag value: " + t.flag);
}
}
Now, I would presume that after we change the value of the flag property on the running thread, we would immediately see the masses of 'True' spitting out to the terminal. However, we don't..
If I un-comment the Thread.sleep lines inside the thread loop, the program works as expected and we see the many lines of 'True' being printed after we change the value in the App object. As an addition, any print method in place of the Thread.sleep also works, but some simple assignment code does not. I assume this is because it is pulled out as un-used code at compile time.
So, my question is really: Why do I have to use some kind of interruption to get the thread to check conditions correctly?
So, my question is really: Why do I have to use some kind of interruption to get the thread to check conditions correctly?
Well you don't have to. There are at least two ways to implement this particular example without using "interruption".
If you declare flag to be volatile, then it will work.
It will also work if you declare flag to be private, write synchronized getter and setter methods, and use those for all accesses.
public class App {
public static void main(String[] args) {
App app = new App();
}
class Test implements Runnable {
private boolean flag = false;
public synchronized boolean getFlag() {
return this.flag;
}
public synchronized void setFlag(boolean flag) {
return this.flag = flag;
}
public void run() {
while(true) {
if (this.getFlag()) { // Must use the getter here too!
System.out.println("True");
}
}
}
}
public App() {
Test t = new Test();
Thread thread = new Thread(t);
System.out.println("Starting thread");
thread.start();
try {
Thread.sleep(1000);
} catch (InterruptedException e) {}
t.setFlag(true);
System.out.println("New flag value: " + t.getFlag());
}
But why do you need to do this?
Because unless you use either a volatile or synchronized (and you use synchronized correctly) then one thread is not guaranteed to see memory changes made by another thread.
In your example, the child thread does not see the up-to-date value of flag. (It is not that the conditions themselves are incorrect or "don't work". They are actually getting stale inputs. This is "garbage in, garbage out".)
The Java Language Specification sets out precisely the conditions under which one thread is guaranteed to see (previous) writes made by another thread. This part of the spec is called the Java Memory Model, and it is in JLS 17.4. There is a more easy to understand explanation in Java Concurrency in Practice by Brian Goetz et al.
Note that the unexpected behavior could be due to the JIT deciding to keep the flag in a register. It could also be that the JIT compiler has decided it does not need force memory cache write-through, etcetera. (The JIT compiler doesn't want to force write-through on every memory write to every field. That would be a major performance hit on multi-core systems ... which most modern machines are.)
The Java interruption mechanism is yet another way to deal with this. You don't need any synchronization because the method calls that. In addition, interruption will work when the thread you are trying to interrupt is currently waiting or blocked on an interruptible operation; e.g. in an Object::wait call.
Because the variable is not modified in that thread, the JVM is free to effectively optimize the check away. To force an actual check, use the volatile keyword:
public volatile boolean flag = false;

What happens when few threads trying to call the same synchronized method?

so I got this horses race and when a horse getting to the finishing line, I invoke an arrival method. Let's say I got 10 threads, one for each horse, and the first horse who arrives indeed invoking 'arrive':
public class FinishingLine {
List arrivals;
public FinishingLine() {
arrivals = new ArrayList<Horse>();
}
public synchronized void arrive(Horse hourse) {
arrivals.add(hourse);
}
}
Ofc I set the arrive method to synchronized but I dont completely understand what could happen if it wasnt synchronized, the professor just said it wouldn't be safe.
Another thing that I would like to understand better is how it is decided which thread will after the first one has been finished? After the first thread finished 'arrive' and the method get unlocked, which thread will run next?
1) It is undefined what the behaviour would be, but you should assume that it is not what you would want it to do in any way that you can rely upon.
If two threads try to add at the same time, you might get both elements added (in either order), only one element added, or maybe even neither.
The pertinent quote from the Javadoc is:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.)
2) This is down to how the OS schedules the threads. There is no guarantee of "fairness" (execution in arrival order) for regular synchronized blocks, although there are certain classes (Semaphore is one) which give you the choice of a fair execution order.
e.g. you can implement a fair execution order by using a Semaphore:
public class FinishingLine {
List arrivals;
final Semaphore semaphore = new Semaphore(1, true);
public FinishingLine() {
arrivals = new ArrayList<Horse>();
}
public void arrive(Horse hourse) {
semaphore.acquire();
try {
arrivals.add(hourse);
} finally {
semaphore.release();
}
}
}
However, it would be easier to do this with a fair blocking queue, which handles the concurrent access for you:
public class FinishingLine {
final BlockingQueue queue = new ArrayBlockingQueue(NUM_HORSES, true);
public void arrive(Horse hourse) {
queue.add(hourse);
}
}

Can we say that by synchronizing a block of code we are making the contained statements atomic?

I want to clear my understanding that if I surround a block of code with synchronized(this){} statement, does this mean that I am making those statements atomic?
No, it does not ensure your statements are atomic. For example, if you have two statements inside one synchronized block, the first may succeed, but the second may fail. Hence, the result is not "all or nothing". But regarding multiple threads, you ensure that no statement of two threads are interleaved. In other words: all statements of all threads are strictly serialized, even so, there is no guarantee, that all or none statements of a thread gets executed.
Have a look at how Atomicity is defined.
Here is an example showing that the reader is able to ready a corrupted state. Hence the synchronized block was not executed atomically (forgive me the nasty formatting):
public class Example {
public static void sleep() {
try { Thread.sleep(400); } catch (InterruptedException e) {};
}
public static void main(String[] args) {
final Example example = new Example(1);
ExecutorService executor = newFixedThreadPool(2);
try {
Future<?> reader = executor.submit(new Runnable() { #Override public void run() {
int value; do {
value = example.getSingleElement();
System.out.println("single value is: " + value);
} while (value != 10);
}});
Future<?> writer = executor.submit(new Runnable() { #Override public void run() {
for (int value = 2; value < 10; value++) example.failDoingAtomic(value);
}});
reader.get(); writer.get();
} catch (Exception e) { e.getCause().printStackTrace();
} finally { executor.shutdown(); }
}
private final Set<Integer> singleElementSet;
public Example(int singleIntValue) {
singleElementSet = new HashSet<>(Arrays.asList(singleIntValue));
}
public synchronized void failDoingAtomic(int replacement) {
singleElementSet.clear();
if (new Random().nextBoolean()) sleep();
else throw new RuntimeException("I failed badly before adding the new value :-(");
singleElementSet.add(replacement);
}
public int getSingleElement() {
return singleElementSet.iterator().next();
}
}
No, synchronization and atomicity are two different concepts.
Synchronization means that a code block can be executed by at most one thread at a time, but other threads (that execute some other code that uses the same data) can see intermediate results produced inside the "synchronized" block.
Atomicity means that other threads do not see intermediate results - they see either the initial or the final state of the data affected by the atomic operation.
It's unfortunate that java uses synchronized as a keyword. A synchronized block in Java is a "mutex" (short for "mutual exclusion"). It's a mechanism that insures only one thread at a time can enter the block.
Mutexes are just one of many tools that are used to achieve "synchronization" in a multi-threaded program: Broadly speaking, synchronization refers to all of the techniques that are used to insure that the threads will work in a coordinated fashion to achieve a desired outcome.
Atomicity is what Oleg Estekhin said, above. We usually hear about it in the context of "transactions." Mutual exclusion (i.e., Java's synchronized) guarantees something less than atomicity: Namely, it protects invariants.
An invariant is any assertion about the program's state that is supposed to be "always" true. E.g., in a game where players exchange virtual coins, the total number of coins in the game might be an invariant. But it's often impossible to advance the state of the program without temporarily breaking the invariant. The purpose of mutexes is to insure that only one thread---the one that is doing the work---can see the temporary "broken" state.
For code that use syncronized on that object - yes.
For code, that don't use syncronized keyword for that object - no.
Can we say that by synchronizing a block of code we are making the contained statements atomic?
You are taking a very big leap there. Atomicity means that the operation if atomic will complete in one CPU cycle or equivalent to one CPU cycle whereas Synchronizing a block means only one thread can access the critical region. It may take multiple CPU cycles for processing code in the critical region(which will make it non atomic).

Does this method in runnable object needs synchronization?

The following method belongs to an object A that implements Runnable. It's called asynchronously by other method from the object A and by code inside the run method (so, it's called from other thread, with a period of 5 seconds).
Could I end up with file creation exceptions?
If i make the method synchronized... the lock is always acquired over the object A ?
The fact that one of the callers is at the run() method confuses me :S
Thanks for your inputs.
private void saveMap(ConcurrentMap<String, String> map) {
ObjectOutputStream obj = null;
try {
obj = new ObjectOutputStream(new FileOutputStream("map.txt"));
obj.writeObject(map);
} catch (IOException ex) {
Logger.getLogger(MessagesFileManager.class.getName()).log(Level.SEVERE, null, ex);
} finally {
try {
obj.close();
} catch (IOException ex) {
Logger.getLogger(MessagesFileManager.class.getName()).log(Level.SEVERE, null, ex);
}
}
notifyActionListeners();
}
Synchronized instance methods use the this object as the lock and prevent simultaneous execution of all synchronized instance methods (even other ones) from different threads.
To answer your question regarding requirements for synchronization, the answer is basically yes because you have multiple threads accessing the same method, so output may collide.
As a design comment, I would make your saveMap method static, because it doesn't access any fields (it's stateless), and it more strongly indicates that output to the file is not dependent on the instance, so it's more obvious that file output may collide with other instances.
Edited:
Here's the code for what I'm suggesting:
private static synchronized void saveMap(Map<String, String> map) {
...
}
FYI, static synchronized methods use the class object (ie MyClass.class), which is a singleton, as the lock object.
It's called asynchronously by other method from the object A and by code inside the run method (so, it's called from other thread, with a period of 5 seconds).
Given that saveMap is called from multiple threads, without synchronization you cannot guarantee that two threads won't try to write to the same file concurrently. This will cause an incorrectly-formatted file when it happens.
The simplest solution is to make the method synchronized.
private synchronized void saveMap(ConcurrentMap<String, String> map) { ... }
If the map is large enough, this may cause unresponsiveness in your program. Another option is to write to a temporary file (a new file each time it's called) and then use synchronization while swapping the new file over map.txt by renaming and deleting.
private void saveMap(ConcurrentMap<String, String> map) {
File file = ... original code to write to a temporary file ...
if (file != null) {
synchronized(this) {
... move file over map.txt ...
}
notifyActionListeners();
}
}
Keep in mind that swapping two files won't be an atomic operation. Any external program or thread from the same program may catch the short time that map.txt doesn't exist. I was unable to find an atomic file-swap method in Java, but maybe with some searching you will.

Categories