when does a thread go out of scope? - java

I've written a program that counts lines, words, and characters in a text: it does this with threads. It works great sometimes, but not so great other times. What ends up happening is the variables pointing to the number of words and characters counted sometimes come up short and sometimes don't.
It seems to me that the threads are sometimes ending before they can count all the words or characters that they want to. Is it because these threads go out of scope when the while (true) loop breaks?
I've included the code from the thready part of my problem below:
private void countText() {
try {
reader = new BufferedReader(new FileReader("this.txt"));
while (true) {
final String line = reader.readLine();
if(line == null) {break;}
lines++;
new Thread(new Runnable() {public void run() {chars += characterCounter(line);}}).start();
new Thread(new Runnable() {public void run() {words += wordCounter(line);}}).start();
println(line);
}
} catch(IOException ex) {return;}
}
(Sub Question: This is the first time I've asked about something and posted code. I don't want to use StackOverflow in place of google and wikipedia and am worried that this isn't an appropriate question? I tried to make the question more general so that I'm not just asking for help with my code... but, is there another website where this kind of question might be more appropriate?)

A different threaded design would make it easier to find and fix this kind of problem, and be more efficient into the bargain. This is a longish response, but the summary is "if you're doing threads in Java, check out java.util.concurrent as soon as humanly possible)".
I guess you're multithreading this code to learn threads rather than to speed up counting words, but that's a very inefficient way to use threads. You're creating two threads per line - two thousand threads for a thousand line file. Creating a thread (in modern JVMs) uses operating system resources and is generally fairly expensive. When two - let alone two thousand - threads have to access a shared resource (such as your chars and words counters), the resulting memory contention also hurts performance.
Making the counter variables synchronized as Chris Kimpton suggests or Atomic as WMR suggests will probably fix the code, but it will also make the effect of contention much worse. I'm pretty sure it will go slower than a single-threaded algorithm.
I suggest having just one long-lived thread which looks after chars, and one for words, each with a work queue to which you submit jobs each time you want to add a new number. This way only one thread is writing to each variable, and if you make changes to the design it'll be more obvious who's responsible for what. It'll also be faster because there's no memory contention and you're not creating hundreds of threads in a tight loop.
It's also important, once you've read all the lines in the file, to wait for all the threads to finish before you actually print out the values of the counters, otherwise you lose the updates from threads that haven't finished yet. With your current design you'd have to build up a big list of threads you created, and run through it at the end checking that they're all dead. With a queue-and-worker-thread design you can just tell each thread to drain its queue and then wait until it's done.
Java (from 1.5 and up) makes this kind of design very easy to implement: check out java.util.concurrent.Executors.newSingleThreadExecutor. It also makes it easy to add more concurrency later on (assuming proper locking etc), as you can just switch to a thread pool rather than a single thread.

As Chris Kimpton already pointed out correctly you have a problem with the updating of chars and words in different threads. Synchronizing on this won't work either because this is a reference to the current thread which means different threads will synchronize on different objects. You could use an extra "lock object" you can synchronize on but the easiest way to fix this would probably be to use AtomicIntegers for the 2 counters:
AtomicInteger chars = new AtomicInteger();
...
new Thread(new Runnable() {public void run() { chars.addAndGet(characterCounter(line));}}).start();
...
While this will probably fix your problem, Sam Stoke's more detailed answer is completely right, the original design is very inefficient.
To answer your question about when a thread "goes out of scope": You are starting two new threads for every line in your file and all of them will run until they reach the end of their run() method. This is unless you make them daemon threads), in that case they'll exit as soon as daemon threads are the only ones still running in this JVM.

Sounds like a good question to me... I think the problem might be related to the atomicity of the chars += and words += - several threads could be calling that at the same time - do you do anything to ensure that there is no interleaving.
That is:
Thread 1, has chars = 10, wants to add 5
Thread 2, has chars = 10, wants to add 3
Thread 1 works out new total, 15
Thread 2 works out new total, 13
Thread 1 sets chars to 15
Thread 2 sets chars to 13.
Might be possible unless you use synchronized when updating those vars.

Related

Java: two threads executing until the boolean flag is false: the second thread's first run stops the first thread

I have this threaded program that has three threads: main, lessonThread, questionThread.
It works like this:
Lesson continues continues to gets printed while the finished
the variable is true;
every 5 seconds the questionThread asks Finish
the lesson? and if the answer is y, it sets finished to false
The problem is that the Lesson continues never gets printed after the question gets asked the first time:
Also, as seen on the picture, sometimes lessonThread sneaks in with its Lesson continues before the user can enter the answer to the questionThread's question.
public class Lesson {
private boolean finished;
private boolean waitingForAnswer;
private Scanner scanner = new Scanner(System.in);
private Thread lessonThread;
private Thread questionThread;
public static void main(String[] args) {
Lesson lesson = new Lesson();
lesson.lessonThread = lesson.new LessonThread();
lesson.questionThread = lesson.new QuestionThread();
lesson.lessonThread.start();
lesson.questionThread.start();
}
class LessonThread extends Thread {
#Override
public void run() {
while (!finished && !waitingForAnswer) {
System.out.println("Lesson continues");
}
}
}
class QuestionThread extends Thread {
private Instant sleepStart = Instant.now();
private boolean isAsleep = true;
#Override
public void run() {
while (!finished) {
if (isAsleep && Instant.now().getEpochSecond() - sleepStart.getEpochSecond() >= 5) {
System.out.print("Finish a lesson? y/n");
waitingForAnswer = true;
String reply = scanner.nextLine().substring(0, 1);
switch (reply.toLowerCase()) {
case "y":
finished = true;
}
waitingForAnswer = false;
isAsleep = true;
sleepStart = Instant.now();
}
}
}
}
}
I think the waitingForAnswer = true might be at fault here, but then, the lessonThread has 5 seconds until the questionThread asks the question again, during which the waitingForAnswer is false.
Any help is greatly appreciated.
EDIT: I found a buy in the loop in the lessonThread and changed it to:
#Override
public void run() {
while (!finished) {
if (!waitingForAnswer) {
System.out.println("Lesson continues");
}
}
}
However, I get the same result.
EDIT: I can get it working when inside a debugger:
this just isn't how you're supposed to work with threads. You have 2 major problems here:
java memory model.
Imagine that one thread writes to some variable, and a fraction of a second later, another thread reads it. If that would be guaranteed to work the way you want it to, that means that write has to propagate all the way through any place that could ever see it before code can continue.. and because you have absolutely no idea which fields are read by some thread until a thread actually reads it (java is not in the business of attempting to look ahead and predict what the code will be doing later), that means every single last write to any variable needs a full propagate sync across all threads that can see it... which is all of them! Modern CPUs have multiple cores and each core has their own cache, and if we apply that rule (all changes must be visible immediately everywhere) you might as well take all that cache and chuck it in the garbage because you wouldn't be able to use it.
If it worked like that - java would be slower than molasses.
So java does not work like that. Any thread is free to make a copy of any field or not, at its discretion. If thread A writes 'true' to some instance's variable, and thread B reads that boolean from the exact same instance many seconds later, java is entirely free to act as if the value is 'false'... even if when code in thread A looks at it, it sees 'true'. At some arbitrary later point the values will sync up. It may take a long time, no guarantees are available to you.
So how do you work with threads in java?
The JMM (Java Memory Model) works by describing so called comes-before/comes-after relationships: Only if code is written to clearly indicate that you intend for some event in thread A to clearly come before some other event in thread B, then java will guarantee that any effects performed in thread A and visible there will also be visible in thread B once B's event (the one that 'came after') has finished.
For example, if thread A does:
synchronized (someRef) {
someRef.intVal1 = 1;
someRef.intVal2 = 2;
}
and thread B does:
synchronized(someRef) {
System.out.println(someRef.intVal1 + someRef.intVal2);
}
then you are guaranteed to witness in B either 0 (which will be the case where B 'won' the fight and got to the synchronized statement first), or 3, which is always printed if B got there last; that synchronized block is establishing a CBCA relationship: The 'winning' thread's closing } 'comes before' the losing thread's opening one, as far as execution is concerned, therefore any writes done by thread A will be visible by thread B by the time it enters it sync block.
Your code does not establish any such relationships, therefore, you have no guarantees.
You establish them with writes/reads from volatile fields, with synchronized(), and with any code that itself uses these, which is a lot of code: Most classes in the java.util.concurrent package, starting threads, and many other things do some sync/volatile access internally.
The flying laptop issue.
It's not the 1980s anymore. Your CPU is capable of doing enough calculations at any given moment to draw enough power to heat a small house comfortably. The reason your laptop or desktop or phone isn't a burning ball of lava is because the CPU is almost always doing entirely nothing whatsoever, and thus not drawing any current and heating up. In fact, once a CPU gets going, it will very very quickly overheat itself and throttle down and run slower. That's because 95%+ of common PC workloads involve a 'burst' of calculations to be done, which the CPU can do in a fraction of a second at full turboboosted power, and then it can go back to idling again whilst the fans and the cooling paste and the heat fins dissipate the heat that this burst of power caused. That's why if you try to do something that causes the CPU to be engaged for a long time, such as encoding video, it seems to go a little faster at first before it slows down to a stable level.. whilst your battery is almost visibly draining down and your fans sound like the laptop is about to take off for higher orbit and follow Doug and Bob to the ISS - because that stable level is 'as fast as the fans and heat sinks can draw the heat away from the CPU so that it doesn't explode'. Which is not as fast as when it was still colder, but still pretty fast. Especially if you have powerful fans.
The upshot of all this?
You must idle that CPU.
something like:
while (true) {}
is a so-called 'busy loop': It does nothing, looping forever, whilst keeping the CPU occupied, burning a hole into the laptop and causing the fans to go ape. This is not a good thing. If you want execution to wait for some event before continuing, then wait for it. Keyword: wait. If you just want to wait for 5 seconds, Thread.sleep(5000) is what you want. Not a busy-loop. If you want to wait until some other thread has performed a job, use the core wait/notifyAll system (these are methods on j.l.Object and interact with the synchronized keyword), or better yet, use a latch or a lock object from java.util.concurrent, those classes are fantastic. If you just want to ensure that 2 threads don't conflict while they touch the same data, use synchronized. All these features will let the CPU idle down. endlessly spinning away in a while loop, checking an if clause - that is a bad idea.
And you get CBCA relationships to boot, which is what is required for any 2 threads to communicate with each other.
And because you're overloading the CPU with work, that sync point where your '= false' writes get synced back over to the other thread probably aren't happening - normally it's relatively hard to observe JMM issues (which is what makes multithreaded programming so tricky - it is complex, you will mess up, it's hard to test for errors, and it's plausible you'll never personally run into this problem today. But tomorrow, with another song on the winamp, on another system, happens all the time). This is a fine way to observe it a lot.
I managed to make it work with making waitingForAnswer volatile:
private volatile boolean waitingForAnswer;

Java threads without affecting performance

Long story short; I've written a program that contains an infinite loop, in which a function is run continuously, and must run as quickly as is possible.
However, whilst this function completes in a microsecond time scale, I need to spawn another thread that will take considerably longer to run, but it must not affect the previous thread.
Hopefully this example will help explain things:
while (updateGUI == true) { //So, forever until terminated
final String tableContents = parser.readTable(location, header);
if (tableContents.length() == 0) {//No table there, nothing to do
} else {
SwingUtilities.invokeLater(new Runnable() {
#Override
public void run() {
Thread.currentThread().setPriority(Thread.MAX_PRIORITY);
//updateTable updates a JTable
updateTable(tableContents, TableModel);
TableColumnModel tcm = guiTable.getColumnModel();
}
});
}
***New thread is needed here!
}
So what I need is for the readTable function to run an infinite number of times, however I then need to start a second thread that will also run an infinite number of times, however it will take milliseconds/seconds to complete, as it has to perform some file I/O and can take a bit of time to complete.
I've played around with extending the Thread class, and using the Executors.newCacheThreadPool to try spawning a new thread. However, anything I do causes the readTable function to slow down, and results in the table not being updated correctly, as it cannot read the data fast enough.
Chances are I need to redesign the way this loop runs, or possible just start two new threads and put the infinite looping within them instead.
The reason for it being designed this way was due to the fact that once the updateTable function runs, it returns a string that is used to update a JTable, which (as far as I know), must be done on Java's Main Dispatch Thread, as that is where the GUI's table was created.
If anyone has any suggestions I'd greatly appreciate them.
Thanks
As you are updating a JTable, SwingWorker will be convenient. In this case, one worker can coexist with another, as suggested here.
You have to be very careful to avoid overloading your machine. You long running task need to be made independent of you thread which must be fast. You also need to put a cap on how many of these are running at once. I would put a cap of one to start with.
Also you screen can only update so fast, and you can only see the screen updating so fast. I would limit the number of updates per second to 20 to start with.
BTW Setting the priority only helps if your machine is overloaded. Your goal should be to ensure it is not overloaded in the first place and then the priority shouldn't matter.
It's very hard to guess what's going on here, but you said "results in the table not being updated correctly, as it cannot read the data fast enough". If you really mean the correctness of the code is affected by the timing not being fast enough, then your code is not thread safe and you need to use proper synchronization.
Correctness must not depend on timing, as timing of thread execution is not deterministic on standard JVMs.
Also, do not fiddle with thread priorities. Unless you are a concurrency guru trying to do something very unusual, you don't need to do this and it may make things confusing and/or break.
So if you want your "infinite" looping thread to have max priority, why are you setting priority to MAX for EDT insted of you "most precious one"?
Thread.currentThread().setPriority(Thread.MAX_PRIORITY);
//updateTable updates a JTable
updateTable(tableContents, TableModel);
TableColumnModel tcm = guiTable.getColumnModel();
In this piece of code current thread will be and EDT, or EDT spawned one. Why not moving that line before intering whileloop?

How to get threads with loops running concurrently to work with Thread.yield()?

I have the following situation. I have an application that runs mostly on one thread. It has grown large, so I would like to run a watchdog thread that gets called whenever the main thread changes into a different block of code / method / class so I can see there is "movement" in the code. If the watchdog gets called by the same area for more than a second or a few, it shall set a volatile boolean that the main thread reads at the next checkpoint and terminate / restart.
Now the problem is getting either of the threads to run somewhat at the same time. As soon as the main thread is running, it will not let the watchdog timer count properly. I was therefore thinking of yielding every time it calls the watchdog (so it could calculate time passed and set the value) but to no avail. Using Thread.sleep(1) instead of Thread.yield() works. But I don't want to have several areas of code just wasting calculation time, I am sure I am not doing it the way it is meant to be used.
Here a very simple example of how I would use Thread.yield(). I do not understand why the Threads here will not switch (they do, after a "long" and largely unpredictable time). Please give me an advice on how to make this simple example output ONE and TWO after each other. Like written before, if I switch yield() with sleep(1), it will work just like I'd need it to (in spite of waiting senselessly).
Runnable run1 = new Runnable(){
public void run(){
while(true){
System.out.println("ONE");
Thread.yield();
}
}
};
Runnable run2 = new Runnable(){
public void run(){
while(true){
System.out.println("TWO");
Thread.yield();
}
}
};
Thread tr1 = new Thread(run1);
Thread tr2 = new Thread(run2);
tr1.start();
tr2.start();
Thread.yield()
This static method is essentially used to notify the system that the
current thread is willing to "give up the CPU" for a while. The
general idea is that:
The thread scheduler will select a different thread to run instead of
the current one.
However, the details of how yielding is implemented by the thread
scheduler differ from platform to platform. In general, you shouldn't
rely on it behaving in a particular way. Things that differ include:
when, after yielding, the thread will get an opportunity to run again;
whether or not the thread foregoes its remaining quantum.
The take away is this behavior is pretty much optional and not guaranteed to actually do anything deterministically.
What you are trying to do is serialize the output of two threads in your example and synchronize the output in your stated problem ( which is a different problem ), and that will require some sort of lock or mutex to block the second thread until the first thread is done, which kind of defeats the point of concurrency which is usually the reason threads are used.
Solution
What you really want is a shared piece of data for a flag status that the second thread can react to the first thread changing. Preferably and event driven message passing pattern would be even easier to implement in a concurrently safe manner.
The second thread would be spawned by the first thread and a method called on it to increment the counter for which block it is in, you would just use pure message passing and pass in a state flag Enum or some other notification of a state change.
What you don't want to do is do any kind of polling. Make it event driven and just have the second thread running always and checking the state of its instance variable that gets set by the parent thread.
I do not understand why the Threads here will not switch (they do, after a "long" and largely unpredictable time). Please give me an advice on how to make this simple example output ONE and TWO after each other. Like written before, if I switch yield() with sleep(1), it will work just like I'd need it to (in spite of waiting senselessly).
I think this is more about the difference between ~1000 println calls in a second (when you use sleep(1)) and many, many more without the sleep. I think the Thread is actually yielding but it may be that it is on a multiple processor box so the yield is effectively a no-op.
So what you are seeing is purely a race condition high volume blast to System.out. If you ran this for a minute with the results going to a file I think you'd see a similar number of "ONE" and "TWO" messages in the output. Even if you removed the yield() you would see this behavior.
I just ran a quick trial with your code sending the output to /tmp/x. The program with yield() ran for 5 seconds, generated 1.9m/483k lines, with the output sort | uniq -c of:
243152 ONE
240409 TWO
This means that each thread is generating upwards of 40,000 lines/second. Then I removed the yield() statements and I got just about the same results with different counts of lines like you'd expect with the race conditions -- but the same order of magnitude.

Java multithreading, getting threads to work in parallel

Suppose you need to deal with 2 threads, a Reader and a Processor.
Reader will read a portion of the stream data and will pass it to the Processor, that will do something.
The idea is to not stress the Reader with too much of data.
In the set up, i
// Processor will pick up data from pipeIn and will place the output in pipeOut
Thread p = new Thread(new Processor(pipeIn, pipeOut));
p.start();
// Reader will pick a bunch of bits from the InputStream and place it to pipeIn
Thread r = new Thread(new Reader(inputStream, pipeIn));
r.start();
Needless to say, neither pipe is null, when initialized.
I am thinking ... When Processor has been started it attempts to read from the pipeIn, in the following loop:
while (readingShouldContinue) {
Thread.sleep(1); // To avoid tight loop
byte[] justRead = readFrom.getDataCurrentlyInQueue();
writeDataToPipe(processData(justRead));
}
If there is no data to write, it will write nothing, should be no problem.
The Reader comes alive and picks up some data from a stream:
while ((in.read(buffer)) != -1) {
// Writes to what processor considers a pipeIn
writeTo.addDataToQueue(buffer);
}
In Pipe itself, i synchronize access to data.
public byte[] getDataCurrentlyInQueue() {
synchronized (q) {
byte[] a = q.peek();
q.clear();
return a;
}
}
I expect the 2 threads to run semi in parallel, interchanging activities between Reader and a Processor. What happens however is that
Reader reads all blocks up front
Processor treats everything as 1 single block
What am i missing please?
What am i missing please?
(First I should point out that you've left out some critical bits of the code and other information that is needed for a specific fact-based answer.)
I can think of a number of possible explanations:
There may simply be a bug in your application. There's not a lot of point guessing what that bug might be, but if you showed us more code ...
The OS thread scheduler will tend to let an active thread keep running until it blocks. If your processor has only one core (or if the OS only allows your application to use one core), then the second thread may starve ... long enough for the first one to finish.
Even if you have multiple cores, the OS thread scheduler may be slow to assign extra cores, especially if the 2nd thread starts and then immediately blocks.
It is possible that there is some "granularity" effect in the buffering that is causing work not to appear in the queue. (You could view this as a bug ... or as a tuning issue.)
It could simply be that you are not giving the application enough load for multi-threading to kick in.
Finally, I can't figure out the Thread.sleep stuff either. A properly written multi-threaded application does not use Thread.sleep for anything but long term delays; e.g. threads that do periodic house-keeping tasks in the background. If you use sleep instead of blocking, then 1) you risk making the application non-responsive, and 2) you may encourage the OS thread scheduler to give the thread fewer time slices. It could well be that this is the source of your trouble vis-a-vis thread starvation.
You reinvented parts of the java concurrent library. it would make things a lot easier if you modeled your threads with BlockingQueue instead of synchronizind things yourself.
Basically your producer would put chunks on the BlockingQueue und your consumer would while(true) loop over the queue and call get(). That way the producer would block/wait until there is a new chunk on the queue.
The reader is reading everything before its first time-slice. This means that the reading is finishing before the processor ever gets a chance to run.
Try increasing the amount of bytes that are being read, or slow down the reader somehow; maybe with a sleep() call every once in a while.
Btw. Don't poll. It is a horrendous waste of CPU cycles, and it doesn't scale at all.
Also use a synchronized queue and forget the manual locking. http://docs.oracle.com/javase/tutorial/collections/implementations/queue.html
When using multiple threads you need to determine whether you
have work which can be performed in parallel efficiently.
are not adding more overhead than the improvement you are likely to achieve
the OS, or some library is not already optimised to do what you are trying to do.
In your case, you have a good example of when not to use multi-threads. The OS is already tuned to read ahead and buffer data before you ask for it. The work the Reader does is relatively trivial. The overhead of creating new buffers, adding them to a queue and passing the data between threads is likely to be greater than the amount of work you are performing in parallel.
When you try to use multiple threads to do a task best done by a single thread, you will get strange profiling/tuning results.
+1 For a good question.

ArrayList and Multithreading in Java

Under what circumstances would an unsynchronized collection, say an ArrayList, cause a problem? I can't think of any, can someone please give me an example where an ArrayList causes a problem and a Vector solves it? I wrote a program that have 2 threads both modifying an arraylist that has one element. One thread puts "bbb" into the arraylist while the other puts "aaa" into the arraylist. I don't really see an instance where the string is half modified, I am on the right track here?
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU). If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
Many Thanks in advance.
There are three aspects of what might go wrong if you use an ArrayList (for example) without adequate synchronization.
The first scenario is that if two threads happen to update the ArrayList at the same time, then it may get corrupted. For instance, the logic of appending to a list goes something like this:
public void add(T element) {
if (!haveSpace(size + 1)) {
expand(size + 1);
}
elements[size] = element;
// HERE
size++;
}
Now suppose that we have one processor / core and two threads executing this code on the same list at the "same time". Suppose that the first thread gets to the point labeled HERE and is preempted. The second thread comes along, and overwrites the slot in elements that the first thread just updated with its own element, and then increments size. When the first thread finally gets control, it updates size. The end result is that we've added the second thread's element and not the first thread's element, and most likely also added a null to the list. (This is just illustrative. In reality, the native code compiler may have reordered the code, and so on. But the point is that bad things can happen if updates happen simultaneously.)
The second scenario arises due to the caching of main memory contents in the CPU's cache memory. Suppose that we have two threads, one adding elements to the list and the second one reading the list's size. When on thread adds an element, it will update the list's size attribute. However, since size is not volatile, the new value of size may not immediately be written out to main memory. Instead, it could sit in the cache until a synchronization point where the Java memory model requires that cached writes get flushed. In the meantime, the second thread could call size() on the list and get a stale value of size. In the worst case, the second thread (calling get(int) for example) might see inconsistent values of size and the elements array, resulting in unexpected exceptions. (Note that kind of problem can happen even when there is only one core and no memory caching. The JIT compiler is free to use CPU registers to cache memory contents, and those registers don't get flushed / refreshed with respect to their memory locations when a thread context switch occurs.)
The third scenario arises when you synchronize operations on the ArrayList; e.g. by wrapping it as a SynchronizedList.
List list = Collections.synchronizedList(new ArrayList());
// Thread 1
List list2 = ...
for (Object element : list2) {
list.add(element);
}
// Thread 2
List list3 = ...
for (Object element : list) {
list3.add(element);
}
If thread2's list is an ArrayList or LinkedList and the two threads run simultaneously, thread 2 will fail with a ConcurrentModificationException. If it is some other (home brew) list, then the results are unpredictable. The problem is that making list a synchronized list is NOT SUFFICIENT to make it thread-safe with respect to a sequence of list operations performed by different threads. To get that, the application would typically need to synchronize at a higher level / coarser grain.
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU).
Correct. If there is only one core available to run the application, obviously only one thread gets to run at a time. This makes some of the hazards impossible and others become much less likely likely to occur. However, it is possible for the OS to switch from one thread to another thread at any point in the code, and at any time.
If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
Yup. That's possible. The probability of it happening is very small1 but that just makes this kind of problem more insidious.
1 - This is because thread time-slicing events are extremely infrequent, when measured on the timescale of hardware clock cycles.
A practical example. At the end list should contain 40 items, but for me it usually shows between 30 and 35. Guess why?
static class ListTester implements Runnable {
private List<Integer> a;
public ListTester(List<Integer> a) {
this.a = a;
}
public void run() {
try {
for (int i = 0; i < 20; ++i) {
a.add(i);
Thread.sleep(10);
}
} catch (InterruptedException e) {
}
}
}
public static void main(String[] args) throws Exception {
ArrayList<Integer> a = new ArrayList<Integer>();
Thread t1 = new Thread(new ListTester(a));
Thread t2 = new Thread(new ListTester(a));
t1.start();
t2.start();
t1.join();
t2.join();
System.out.println(a.size());
for (int i = 0; i < a.size(); ++i) {
System.out.println(i + " " + a.get(i));
}
}
edit
There're more comprehensive explanations around (for example, Stephen C's post), but I'll make a little comment since mfukar asked. (should've done it right away, when posting answer)
This is the famous problem of incrementing integer from two different threads. There's a nice explanation in Sun's Java tutorial on concurrency. Only in that example they have --i and ++i and we have ++size twice. (++size is part of ArrayList#add implementation.)
I don't really see an instance where the string is half modified, I am on the right track here?
That won't happen. However, what could happen is that only one of the strings gets added. Or that an exception occurs during the call to add.
can someone please give me an example where an ArrayList causes a problem and a Vector solves it?
If you want to access a collection from multiple threads, you need to synchronize this access. However, just using a Vector does not really solve the problem. You will not get the issues described above, but the following pattern will still not work:
// broken, even though vector is "thread-safe"
if (vector.isEmpty())
vector.add(1);
The Vector itself will not get corrupted, but that does not mean that it cannot get into states that your business logic would not want to have.
You need to synchronize in your application code (and then there is no need to use Vector).
synchronized(list){
if (list.isEmpty())
list.add(1);
}
The concurrency utility packages also has a number of collections that provide atomic operations necessary for thread-safe queues and such.
The first part of youe query has been already answered. I will try to answer the second part :
Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU). If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?
in wait-notify framework, the thread aquiring the lock on the object releases it when waiting on some condition. A great example is the producer-consumer problem. See here: link text
When will it cause trouble?
Anytime that a thread is reading the ArrayList and the other one is writing, or when they are both writing. Here's a very known example.
Also, I remember that I was told that
multiple threads are not really
running simultaneously, 1 thread is
run for sometime and another thread
runs after that(on computers with a
single CPU). If that was correct, how
could two threads ever access the same
data at the same time? Maybe thread 1
will be stopped in the middle of
modifying something and thread 2 will
be started?
Yes, Single core cpus can execute only one instruction at a time (not really, pipelining has been here for a while, but as a professor once said, thats "free" parallelism). Even though, each process running in your computer is only executed for a period of time, then it goes to an idle state. In that moment, another process may start/continue its execution. And then go into an idle state or finish. Processes execution are interleaved.
With threads the same thing happens, only that they are contained inside a process. How they execute is dependant on the Operating System, but the concept remains the same. They change from active to idle constantly through their lifetime.
You cannot control when one thread will be stopped and other will start. Thread 1 will not wait until it has completely finished adding data. There is always possible to corrupt data.

Categories