ArrayList and Multithreading in Java - java

Under what circumstances would an unsynchronized collection, say an ArrayList, cause a problem? I can't think of any, can someone please give me an example where an ArrayList causes a problem and a Vector solves it? I wrote a program that have 2 threads both modifying an arraylist that has one element. One thread puts "bbb" into the arraylist while the other puts "aaa" into the arraylist. I don't really see an instance where the string is half modified, I am on the right track here?
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU). If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
Many Thanks in advance.

There are three aspects of what might go wrong if you use an ArrayList (for example) without adequate synchronization.
The first scenario is that if two threads happen to update the ArrayList at the same time, then it may get corrupted. For instance, the logic of appending to a list goes something like this:
public void add(T element) {
if (!haveSpace(size + 1)) {
expand(size + 1);
}
elements[size] = element;
// HERE
size++;
}
Now suppose that we have one processor / core and two threads executing this code on the same list at the "same time". Suppose that the first thread gets to the point labeled HERE and is preempted. The second thread comes along, and overwrites the slot in elements that the first thread just updated with its own element, and then increments size. When the first thread finally gets control, it updates size. The end result is that we've added the second thread's element and not the first thread's element, and most likely also added a null to the list. (This is just illustrative. In reality, the native code compiler may have reordered the code, and so on. But the point is that bad things can happen if updates happen simultaneously.)
The second scenario arises due to the caching of main memory contents in the CPU's cache memory. Suppose that we have two threads, one adding elements to the list and the second one reading the list's size. When on thread adds an element, it will update the list's size attribute. However, since size is not volatile, the new value of size may not immediately be written out to main memory. Instead, it could sit in the cache until a synchronization point where the Java memory model requires that cached writes get flushed. In the meantime, the second thread could call size() on the list and get a stale value of size. In the worst case, the second thread (calling get(int) for example) might see inconsistent values of size and the elements array, resulting in unexpected exceptions. (Note that kind of problem can happen even when there is only one core and no memory caching. The JIT compiler is free to use CPU registers to cache memory contents, and those registers don't get flushed / refreshed with respect to their memory locations when a thread context switch occurs.)
The third scenario arises when you synchronize operations on the ArrayList; e.g. by wrapping it as a SynchronizedList.
List list = Collections.synchronizedList(new ArrayList());
// Thread 1
List list2 = ...
for (Object element : list2) {
list.add(element);
}
// Thread 2
List list3 = ...
for (Object element : list) {
list3.add(element);
}
If thread2's list is an ArrayList or LinkedList and the two threads run simultaneously, thread 2 will fail with a ConcurrentModificationException. If it is some other (home brew) list, then the results are unpredictable. The problem is that making list a synchronized list is NOT SUFFICIENT to make it thread-safe with respect to a sequence of list operations performed by different threads. To get that, the application would typically need to synchronize at a higher level / coarser grain.
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU).
Correct. If there is only one core available to run the application, obviously only one thread gets to run at a time. This makes some of the hazards impossible and others become much less likely likely to occur. However, it is possible for the OS to switch from one thread to another thread at any point in the code, and at any time.
If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
Yup. That's possible. The probability of it happening is very small1 but that just makes this kind of problem more insidious.
1 - This is because thread time-slicing events are extremely infrequent, when measured on the timescale of hardware clock cycles.

A practical example. At the end list should contain 40 items, but for me it usually shows between 30 and 35. Guess why?
static class ListTester implements Runnable {
private List<Integer> a;
public ListTester(List<Integer> a) {
this.a = a;
}
public void run() {
try {
for (int i = 0; i < 20; ++i) {
a.add(i);
Thread.sleep(10);
}
} catch (InterruptedException e) {
}
}
}
public static void main(String[] args) throws Exception {
ArrayList<Integer> a = new ArrayList<Integer>();
Thread t1 = new Thread(new ListTester(a));
Thread t2 = new Thread(new ListTester(a));
t1.start();
t2.start();
t1.join();
t2.join();
System.out.println(a.size());
for (int i = 0; i < a.size(); ++i) {
System.out.println(i + " " + a.get(i));
}
}
edit
There're more comprehensive explanations around (for example, Stephen C's post), but I'll make a little comment since mfukar asked. (should've done it right away, when posting answer)
This is the famous problem of incrementing integer from two different threads. There's a nice explanation in Sun's Java tutorial on concurrency. Only in that example they have --i and ++i and we have ++size twice. (++size is part of ArrayList#add implementation.)

I don't really see an instance where the string is half modified, I am on the right track here?
That won't happen. However, what could happen is that only one of the strings gets added. Or that an exception occurs during the call to add.
can someone please give me an example where an ArrayList causes a problem and a Vector solves it?
If you want to access a collection from multiple threads, you need to synchronize this access. However, just using a Vector does not really solve the problem. You will not get the issues described above, but the following pattern will still not work:
// broken, even though vector is "thread-safe"
if (vector.isEmpty())
vector.add(1);
The Vector itself will not get corrupted, but that does not mean that it cannot get into states that your business logic would not want to have.
You need to synchronize in your application code (and then there is no need to use Vector).
synchronized(list){
if (list.isEmpty())
list.add(1);
}
The concurrency utility packages also has a number of collections that provide atomic operations necessary for thread-safe queues and such.

The first part of youe query has been already answered. I will try to answer the second part :
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU). If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
in wait-notify framework, the thread aquiring the lock on the object releases it when waiting on some condition. A great example is the producer-consumer problem. See here: link text

When will it cause trouble?
Anytime that a thread is reading the ArrayList and the other one is writing, or when they are both writing. Here's a very known example.
Also, I remember that I was told that
multiple threads are not really
running simultaneously, 1 thread is
run for sometime and another thread
runs after that(on computers with a
single CPU). If that was correct, how
could two threads ever access the same
data at the same time? Maybe thread 1
will be stopped in the middle of
modifying something and thread 2 will
be started?
Yes, Single core cpus can execute only one instruction at a time (not really, pipelining has been here for a while, but as a professor once said, thats "free" parallelism). Even though, each process running in your computer is only executed for a period of time, then it goes to an idle state. In that moment, another process may start/continue its execution. And then go into an idle state or finish. Processes execution are interleaved.
With threads the same thing happens, only that they are contained inside a process. How they execute is dependant on the Operating System, but the concept remains the same. They change from active to idle constantly through their lifetime.

You cannot control when one thread will be stopped and other will start. Thread 1 will not wait until it has completely finished adding data. There is always possible to corrupt data.

Related

Java prioritizing threads

My main thread has a private LinkedList which contains task objects for the players in my game. I then have a separate thread that runs every hour that accesses and clears that LinkedList and runs my algorithm which randomly adds new uncompleted tasks to every players LinkedList. Right now I made a getter method that is synchronized so that I dont run into any concurrency issues. This works fine but the synchronized keyword has a lot of overhead especially since its accessed a ton from the main thread while only accessed hourly from my second thread.
I am wondering if there is a way to prioritize the main thread? For example on that 2nd thread I could loop through the players then make a new LinkedList then run my algorithm and add all the tasks to that LinkedList then quickly assign the old LinkedList equal to the new one. This would slightly increase memory usage on the stack while improving main thread speed.
Basically I am trying to avoid making my main thread use synchronization when it will only be used once an hour at most and I am willing to greatly degrade the performance of the 2nd thread to keep the main threads speed. Is there a way I can use the 2nd thread to notify the 1st that it will be locking a method instead of having the 1st thread physically have to go through all of the synchronization over head steps? I feel like this would be possible since if that 2nd thread shares a cache with the main thread and it could change a boolean denoting that the main thread has to wait till that variable is changed back. The main thread would have to check that boolean every time it tries run that method and if the 2nd thread is telling it to wait the main thread will then freeze till the boolean is changed.
Of course the 2nd thread would have to specify which object and method has the lock along with a binary 0 or 1 denoting if its locked or not. Then the main thread would just need to check its shared cache for the object and the binary boolean value once it reaches that method which seems way faster than normal synchronization. Anyways this would then result in them main thread running at normal speed while the 2nd thread handles a bunch of work behind the scenes without degrading main thread performance. Does this exist if so how can I do it and if it does not exist how hard would it actually be to implement?
Premature optimization
It sounds like you are overly worried about the cost of synchronization. Doing a dozen, or a hundred, or even a thousand synchronizations once an hour is not going to impact the performance of your app by any significant amount.
If your concern has not yet been validated by careful study with a profiling tool, you’ve fallen into the common trap of premature optimization.
AtomicReference
Nevertheless, I can suggest an alternative approach.
You want to replace a list once an hour. If you do not mind letting any threads continue using the current list already accessed while you swap out for a new list, then use AtomicReference. An object of this class holds the reference to another object of a specified type.
I generally like the Atomic… classes for thread-safety work because they scream out to the reader that a concurrency problem is at hand.
AtomicReference < List < Task > > listRef = new AtomicReference<>( originalList ) ;
A different thread is able to replace that reference to the old list with a reference to the new list.
listRef.set( newList ) ;
Access by the other thread:
List< Task > list = listRef.get() ;
Note that this approach does not make thread-safe the payload, the list itself. But you claim that only a single thread will ever be manipulating the content of the list. You claim a different thread will only replace the entire list. So this AtomicReference serves the purpose of replacing the list in a thread-safe manner while making the issue of concurrency quite obvious.
volatile
Using AtomicReference accomplishes the same goal as volatile. I’m wary of volatile because (a) its use may go unnoticed by the reader, and (b) I suspect many Java programmers do not understand volatile, especially since its meaning was redefined.
For more info about why plain reference assignment is not thread-safe, see this Question.

Java: two threads executing until the boolean flag is false: the second thread's first run stops the first thread

I have this threaded program that has three threads: main, lessonThread, questionThread.
It works like this:
Lesson continues continues to gets printed while the finished
the variable is true;
every 5 seconds the questionThread asks Finish
the lesson? and if the answer is y, it sets finished to false
The problem is that the Lesson continues never gets printed after the question gets asked the first time:
Also, as seen on the picture, sometimes lessonThread sneaks in with its Lesson continues before the user can enter the answer to the questionThread's question.
public class Lesson {
private boolean finished;
private boolean waitingForAnswer;
private Scanner scanner = new Scanner(System.in);
private Thread lessonThread;
private Thread questionThread;
public static void main(String[] args) {
Lesson lesson = new Lesson();
lesson.lessonThread = lesson.new LessonThread();
lesson.questionThread = lesson.new QuestionThread();
lesson.lessonThread.start();
lesson.questionThread.start();
}
class LessonThread extends Thread {
#Override
public void run() {
while (!finished && !waitingForAnswer) {
System.out.println("Lesson continues");
}
}
}
class QuestionThread extends Thread {
private Instant sleepStart = Instant.now();
private boolean isAsleep = true;
#Override
public void run() {
while (!finished) {
if (isAsleep && Instant.now().getEpochSecond() - sleepStart.getEpochSecond() >= 5) {
System.out.print("Finish a lesson? y/n");
waitingForAnswer = true;
String reply = scanner.nextLine().substring(0, 1);
switch (reply.toLowerCase()) {
case "y":
finished = true;
}
waitingForAnswer = false;
isAsleep = true;
sleepStart = Instant.now();
}
}
}
}
}
I think the waitingForAnswer = true might be at fault here, but then, the lessonThread has 5 seconds until the questionThread asks the question again, during which the waitingForAnswer is false.
Any help is greatly appreciated.
EDIT: I found a buy in the loop in the lessonThread and changed it to:
#Override
public void run() {
while (!finished) {
if (!waitingForAnswer) {
System.out.println("Lesson continues");
}
}
}
However, I get the same result.
EDIT: I can get it working when inside a debugger:
this just isn't how you're supposed to work with threads. You have 2 major problems here:
java memory model.
Imagine that one thread writes to some variable, and a fraction of a second later, another thread reads it. If that would be guaranteed to work the way you want it to, that means that write has to propagate all the way through any place that could ever see it before code can continue.. and because you have absolutely no idea which fields are read by some thread until a thread actually reads it (java is not in the business of attempting to look ahead and predict what the code will be doing later), that means every single last write to any variable needs a full propagate sync across all threads that can see it... which is all of them! Modern CPUs have multiple cores and each core has their own cache, and if we apply that rule (all changes must be visible immediately everywhere) you might as well take all that cache and chuck it in the garbage because you wouldn't be able to use it.
If it worked like that - java would be slower than molasses.
So java does not work like that. Any thread is free to make a copy of any field or not, at its discretion. If thread A writes 'true' to some instance's variable, and thread B reads that boolean from the exact same instance many seconds later, java is entirely free to act as if the value is 'false'... even if when code in thread A looks at it, it sees 'true'. At some arbitrary later point the values will sync up. It may take a long time, no guarantees are available to you.
So how do you work with threads in java?
The JMM (Java Memory Model) works by describing so called comes-before/comes-after relationships: Only if code is written to clearly indicate that you intend for some event in thread A to clearly come before some other event in thread B, then java will guarantee that any effects performed in thread A and visible there will also be visible in thread B once B's event (the one that 'came after') has finished.
For example, if thread A does:
synchronized (someRef) {
someRef.intVal1 = 1;
someRef.intVal2 = 2;
}
and thread B does:
synchronized(someRef) {
System.out.println(someRef.intVal1 + someRef.intVal2);
}
then you are guaranteed to witness in B either 0 (which will be the case where B 'won' the fight and got to the synchronized statement first), or 3, which is always printed if B got there last; that synchronized block is establishing a CBCA relationship: The 'winning' thread's closing } 'comes before' the losing thread's opening one, as far as execution is concerned, therefore any writes done by thread A will be visible by thread B by the time it enters it sync block.
Your code does not establish any such relationships, therefore, you have no guarantees.
You establish them with writes/reads from volatile fields, with synchronized(), and with any code that itself uses these, which is a lot of code: Most classes in the java.util.concurrent package, starting threads, and many other things do some sync/volatile access internally.
The flying laptop issue.
It's not the 1980s anymore. Your CPU is capable of doing enough calculations at any given moment to draw enough power to heat a small house comfortably. The reason your laptop or desktop or phone isn't a burning ball of lava is because the CPU is almost always doing entirely nothing whatsoever, and thus not drawing any current and heating up. In fact, once a CPU gets going, it will very very quickly overheat itself and throttle down and run slower. That's because 95%+ of common PC workloads involve a 'burst' of calculations to be done, which the CPU can do in a fraction of a second at full turboboosted power, and then it can go back to idling again whilst the fans and the cooling paste and the heat fins dissipate the heat that this burst of power caused. That's why if you try to do something that causes the CPU to be engaged for a long time, such as encoding video, it seems to go a little faster at first before it slows down to a stable level.. whilst your battery is almost visibly draining down and your fans sound like the laptop is about to take off for higher orbit and follow Doug and Bob to the ISS - because that stable level is 'as fast as the fans and heat sinks can draw the heat away from the CPU so that it doesn't explode'. Which is not as fast as when it was still colder, but still pretty fast. Especially if you have powerful fans.
The upshot of all this?
You must idle that CPU.
something like:
while (true) {}
is a so-called 'busy loop': It does nothing, looping forever, whilst keeping the CPU occupied, burning a hole into the laptop and causing the fans to go ape. This is not a good thing. If you want execution to wait for some event before continuing, then wait for it. Keyword: wait. If you just want to wait for 5 seconds, Thread.sleep(5000) is what you want. Not a busy-loop. If you want to wait until some other thread has performed a job, use the core wait/notifyAll system (these are methods on j.l.Object and interact with the synchronized keyword), or better yet, use a latch or a lock object from java.util.concurrent, those classes are fantastic. If you just want to ensure that 2 threads don't conflict while they touch the same data, use synchronized. All these features will let the CPU idle down. endlessly spinning away in a while loop, checking an if clause - that is a bad idea.
And you get CBCA relationships to boot, which is what is required for any 2 threads to communicate with each other.
And because you're overloading the CPU with work, that sync point where your '= false' writes get synced back over to the other thread probably aren't happening - normally it's relatively hard to observe JMM issues (which is what makes multithreaded programming so tricky - it is complex, you will mess up, it's hard to test for errors, and it's plausible you'll never personally run into this problem today. But tomorrow, with another song on the winamp, on another system, happens all the time). This is a fine way to observe it a lot.
I managed to make it work with making waitingForAnswer volatile:
private volatile boolean waitingForAnswer;

How can I be notified when a thread (that I didn't start) ends?

I have a library in a Jar file that needs to keep track of how many threads that use my library. When a new thread comes in is no problem: I add it to a list. But I need to remove the thread from the list when it dies.
This is in a Jar file so I have no control over when or how many threads come through. Since I didn't start the thread, I cannot force the app (that uses my Jar) to call a method in my Jar that says, "this thread is ending, remove it from your list". I'd REALLY rather not have to constantly run through all the threads in the list with Thread.isAlive().
By the way: this is a port of some C++ code which resides in a DLL and easily handles the DLL_THREAD_DETACH message. I'd like something similar in Java.
Edit:
The reason for keeping a list of threads is: we need to limit the number of threads that use our library - for business reasons. When a thread enters our library we check to see if it's in the list. If not, it's added. If it is in the list, we retrieve some thread-specific data. When the thread dies, we need to remove it from the list. Ideally, I'd like to be notified when it dies so I can remove it from the list. I can store the data in ThreadLocal, but that still doesn't help me get notification of when the thread dies.
Edit2:
Original first sentence was: "I have a library in a Jar file that needs to keep track of threads that use objects in the library."
Normally you would let the GC clean up resources. You can add a component to the thread which will be cleaned up when it is not longer accessible.
If you use a custom ThreadGroup, it will me notified when a thread is removed from the group. If you start the JAR using a thread in the group, it will also be part of the group. You can also change a threads group so it will be notifed via reflection.
However, polling the threads every few second is likely to be simpler.
You can use a combination of ThreadLocal and WeakReference. Create some sort of "ticket" object and when a thread enters the library, create a new ticket and put it in the ThreadLocal. Also, create a WeakReference (with a ReferenceQueue) to the ticket instance and put it in a list inside your library. When the thread exits, the ticket will be garbage collected and your WeakReference will be queued. by polling the ReferenceQueue, you can essentially get "events" indicating when a thread exits.
Based on your edits, your real problem is not tracking when a thread dies, but instead limiting access to your library. Which is good, because there's no portable way to track when a thread dies (and certainly no way within the Java API).
I would approach this using a passive technique, rather than an active technique of trying to generate and respond to an event. You say that you're already creating thread-local data on entry to your library, which means that you already have the cutpoint to perform a passive check. I would implement a ThreadManager class that looks like the following (you could as easily make the methods/variables static):
public class MyThreadLocalData {
// ...
}
public class TooManyThreadsException
extends RuntimeException {
// ...
}
public class ThreadManager
{
private final static int MAX_SIZE = 10;
private ConcurrentHashMap<Thread,MyThreadLocalData> threadTable = new ConcurrentHashMap<Thread,ThreadManager.MyThreadLocalData>();
private Object tableLock = new Object();
public MyThreadLocalData getThreadLocalData() {
MyThreadLocalData data = threadTable.get(Thread.currentThread());
if (data != null) return data;
synchronized (tableLock) {
if (threadTable.size() >= MAX_SIZE) {
doCleanup();
}
if (threadTable.size() >= MAX_SIZE) {
throw new TooManyThreadsException();
}
data = createThreadLocalData();
threadTable.put(Thread.currentThread(), data);
return data;
}
}
The thread-local data is maintained in threadTable. This is a ConcurrentHashMap, which means that it provides fast concurrent reads, as well as concurrent iteration (that will be important below). In the happy case, the thread has already been here, so we just return its thread-local data.
In the case where a new thread has called into the library, we need to create its thread-local data. If we have fewer threads than the limit, this proceeds quickly: we create the data, store it in the map, and return it (createThreadLocalData() could be replaced with a new, but I tend to like factory methods in code like this).
The sad case is where the table is already at its maximum size when a new thread enters. Because we have no way to know when a thread is done, I chose to simply leave the dead threads in the table until we need space -- just like the JVM and memory management. If we need space, we execute doCleanup() to purge the dead threads (garbage). If there still isn't enough space once we've cleared dead threads, we throw (we could also implement waiting, but that would increase complexity and is generally a bad idea for a library).
Synchronization is important. If we have two new threads come through at the same time, we need to block one while the other tries to get added to the table. The critical section must include the entirety of checking, optionally cleaning up, and adding the new item. If you don't make that entire operation atomic, you risk exceeding your limit. Note, however, that the initial get() does not need to be in the atomic section, so we don't need to synchronize the entire method.
OK, on to doCleanup(): this simply iterates the map and looks for threads that are no longer alive. If it finds one, it calls the destructor ("anti-factory") for its thread-local data:
private void doCleanup() {
for (Thread thread : threadTable.keySet()) {
if (! thread.isAlive()) {
MyThreadLocalData data = threadTable.remove(thread);
if (data != null) {
destroyThreadLocalData(data);
}
}
}
}
Even though this function is called from within a synchronized block, it's written as if it could be called concurrently. One of the nice features of ConcurrentHashMap is that any iterators it produces can be used concurrently, and give a view of the map at the time of call. However, that means that two threads might check the same map entry, and we don't want to call the destructor twice. So we use remove() to get the entry, and if it's null we know that it's already been (/being) cleaned up by another thread.
As it turns out, you might want to call the method concurrently. Personally, I think the "clean up when necessary" approach is simplest, but your thread-local data might be expensive to hold if it's not going to be used. If that's the case, create a Timer that will repeatedly call doCleanup():
public Timer scheduleCleanup(long interval) {
TimerTask task = new TimerTask() {
#Override
public void run() {
doCleanup();
}
};
Timer timer = new Timer(getClass().getName(), true);
timer.scheduleAtFixedRate(task, 0L, interval);
return timer;
}

How to get threads with loops running concurrently to work with Thread.yield()?

I have the following situation. I have an application that runs mostly on one thread. It has grown large, so I would like to run a watchdog thread that gets called whenever the main thread changes into a different block of code / method / class so I can see there is "movement" in the code. If the watchdog gets called by the same area for more than a second or a few, it shall set a volatile boolean that the main thread reads at the next checkpoint and terminate / restart.
Now the problem is getting either of the threads to run somewhat at the same time. As soon as the main thread is running, it will not let the watchdog timer count properly. I was therefore thinking of yielding every time it calls the watchdog (so it could calculate time passed and set the value) but to no avail. Using Thread.sleep(1) instead of Thread.yield() works. But I don't want to have several areas of code just wasting calculation time, I am sure I am not doing it the way it is meant to be used.
Here a very simple example of how I would use Thread.yield(). I do not understand why the Threads here will not switch (they do, after a "long" and largely unpredictable time). Please give me an advice on how to make this simple example output ONE and TWO after each other. Like written before, if I switch yield() with sleep(1), it will work just like I'd need it to (in spite of waiting senselessly).
Runnable run1 = new Runnable(){
public void run(){
while(true){
System.out.println("ONE");
Thread.yield();
}
}
};
Runnable run2 = new Runnable(){
public void run(){
while(true){
System.out.println("TWO");
Thread.yield();
}
}
};
Thread tr1 = new Thread(run1);
Thread tr2 = new Thread(run2);
tr1.start();
tr2.start();
Thread.yield()
This static method is essentially used to notify the system that the
current thread is willing to "give up the CPU" for a while. The
general idea is that:
The thread scheduler will select a different thread to run instead of
the current one.
However, the details of how yielding is implemented by the thread
scheduler differ from platform to platform. In general, you shouldn't
rely on it behaving in a particular way. Things that differ include:
when, after yielding, the thread will get an opportunity to run again;
whether or not the thread foregoes its remaining quantum.
The take away is this behavior is pretty much optional and not guaranteed to actually do anything deterministically.
What you are trying to do is serialize the output of two threads in your example and synchronize the output in your stated problem ( which is a different problem ), and that will require some sort of lock or mutex to block the second thread until the first thread is done, which kind of defeats the point of concurrency which is usually the reason threads are used.
Solution
What you really want is a shared piece of data for a flag status that the second thread can react to the first thread changing. Preferably and event driven message passing pattern would be even easier to implement in a concurrently safe manner.
The second thread would be spawned by the first thread and a method called on it to increment the counter for which block it is in, you would just use pure message passing and pass in a state flag Enum or some other notification of a state change.
What you don't want to do is do any kind of polling. Make it event driven and just have the second thread running always and checking the state of its instance variable that gets set by the parent thread.
I do not understand why the Threads here will not switch (they do, after a "long" and largely unpredictable time). Please give me an advice on how to make this simple example output ONE and TWO after each other. Like written before, if I switch yield() with sleep(1), it will work just like I'd need it to (in spite of waiting senselessly).
I think this is more about the difference between ~1000 println calls in a second (when you use sleep(1)) and many, many more without the sleep. I think the Thread is actually yielding but it may be that it is on a multiple processor box so the yield is effectively a no-op.
So what you are seeing is purely a race condition high volume blast to System.out. If you ran this for a minute with the results going to a file I think you'd see a similar number of "ONE" and "TWO" messages in the output. Even if you removed the yield() you would see this behavior.
I just ran a quick trial with your code sending the output to /tmp/x. The program with yield() ran for 5 seconds, generated 1.9m/483k lines, with the output sort | uniq -c of:
243152 ONE
240409 TWO
This means that each thread is generating upwards of 40,000 lines/second. Then I removed the yield() statements and I got just about the same results with different counts of lines like you'd expect with the race conditions -- but the same order of magnitude.

when does a thread go out of scope?

I've written a program that counts lines, words, and characters in a text: it does this with threads. It works great sometimes, but not so great other times. What ends up happening is the variables pointing to the number of words and characters counted sometimes come up short and sometimes don't.
It seems to me that the threads are sometimes ending before they can count all the words or characters that they want to. Is it because these threads go out of scope when the while (true) loop breaks?
I've included the code from the thready part of my problem below:
private void countText() {
try {
reader = new BufferedReader(new FileReader("this.txt"));
while (true) {
final String line = reader.readLine();
if(line == null) {break;}
lines++;
new Thread(new Runnable() {public void run() {chars += characterCounter(line);}}).start();
new Thread(new Runnable() {public void run() {words += wordCounter(line);}}).start();
println(line);
}
} catch(IOException ex) {return;}
}
(Sub Question: This is the first time I've asked about something and posted code. I don't want to use StackOverflow in place of google and wikipedia and am worried that this isn't an appropriate question? I tried to make the question more general so that I'm not just asking for help with my code... but, is there another website where this kind of question might be more appropriate?)
A different threaded design would make it easier to find and fix this kind of problem, and be more efficient into the bargain. This is a longish response, but the summary is "if you're doing threads in Java, check out java.util.concurrent as soon as humanly possible)".
I guess you're multithreading this code to learn threads rather than to speed up counting words, but that's a very inefficient way to use threads. You're creating two threads per line - two thousand threads for a thousand line file. Creating a thread (in modern JVMs) uses operating system resources and is generally fairly expensive. When two - let alone two thousand - threads have to access a shared resource (such as your chars and words counters), the resulting memory contention also hurts performance.
Making the counter variables synchronized as Chris Kimpton suggests or Atomic as WMR suggests will probably fix the code, but it will also make the effect of contention much worse. I'm pretty sure it will go slower than a single-threaded algorithm.
I suggest having just one long-lived thread which looks after chars, and one for words, each with a work queue to which you submit jobs each time you want to add a new number. This way only one thread is writing to each variable, and if you make changes to the design it'll be more obvious who's responsible for what. It'll also be faster because there's no memory contention and you're not creating hundreds of threads in a tight loop.
It's also important, once you've read all the lines in the file, to wait for all the threads to finish before you actually print out the values of the counters, otherwise you lose the updates from threads that haven't finished yet. With your current design you'd have to build up a big list of threads you created, and run through it at the end checking that they're all dead. With a queue-and-worker-thread design you can just tell each thread to drain its queue and then wait until it's done.
Java (from 1.5 and up) makes this kind of design very easy to implement: check out java.util.concurrent.Executors.newSingleThreadExecutor. It also makes it easy to add more concurrency later on (assuming proper locking etc), as you can just switch to a thread pool rather than a single thread.
As Chris Kimpton already pointed out correctly you have a problem with the updating of chars and words in different threads. Synchronizing on this won't work either because this is a reference to the current thread which means different threads will synchronize on different objects. You could use an extra "lock object" you can synchronize on but the easiest way to fix this would probably be to use AtomicIntegers for the 2 counters:
AtomicInteger chars = new AtomicInteger();
...
new Thread(new Runnable() {public void run() { chars.addAndGet(characterCounter(line));}}).start();
...
While this will probably fix your problem, Sam Stoke's more detailed answer is completely right, the original design is very inefficient.
To answer your question about when a thread "goes out of scope": You are starting two new threads for every line in your file and all of them will run until they reach the end of their run() method. This is unless you make them daemon threads), in that case they'll exit as soon as daemon threads are the only ones still running in this JVM.
Sounds like a good question to me... I think the problem might be related to the atomicity of the chars += and words += - several threads could be calling that at the same time - do you do anything to ensure that there is no interleaving.
That is:
Thread 1, has chars = 10, wants to add 5
Thread 2, has chars = 10, wants to add 3
Thread 1 works out new total, 15
Thread 2 works out new total, 13
Thread 1 sets chars to 15
Thread 2 sets chars to 13.
Might be possible unless you use synchronized when updating those vars.

Categories