This is for learning purposes.
Imagine I want to calculcate prime numbers and use a ThreadPoolExecutor to do so.
Below you can see my current implementation, which is kind of silly.
My structure:
I generate numbers in a certain range.
For each generated number, create a task to check whether the given number is a prime.
If it is a prime, the result of the operation is the number, else it is null.
A collector goes through the resultlist and checks if there is a number or null. In case it is a number, write that number down to a certain file (here: sorted by amount of digits)
What I would like to do instead: If the number to be checked in the task is not a prime, delete my future from the list/cancel it. As far as I know, only the Executor itself can cancel a Future.
What I want is the task itself to say "Hey, I know my result is no use to you, so please ignore me while iterating throught the list".
I do not know how to do so.
What I do right now (relevant part):
final List<Future<Long>> resultList = new ArrayList<>();
final BlockingQueue<Runnable> workingQueue = new ArrayBlockingQueue<>(CAPACITY);
final ExecutorService exec = new ThreadPoolExecutor(
Runtime.getRuntime().availableProcessors() - 2,
Runtime.getRuntime().availableProcessors() - 1,
5, TimeUnit.SECONDS,
workingQueue,
new ThreadPoolExecutor.CallerRunsPolicy()
);
for (long i = GENERATEFROM; i <= GENERATETO; i++) {
Future<Long> result = exec.submit(new Worker(i));
resultList.add(result);
}
Collector collector = new Collector(resultList,GENERATETO);
collector.start();
exec.shutdown();
A Worker is there to execute one task(is it a prime number?)
public class Worker implements Callable<Long> {
private long number;
public Worker(long number) {
this.number = number;
}
//checks whether an int is prime or not.
boolean isPrime(long n) {
//check if n is a multiple of 2
if (n % 2 == 0) return false;
//if not, then just check the odds
for (long i = 3; i * i <= n; i += 2) {
if (n % i == 0)
return false;
}
return true;
}
#Override
public Long call() throws Exception {
if (isPrime(number)) {
return number;
}
return null;
}
}
And, for the sake of completence, my collector:
public class Collector {
private List<Future<Long>> primeNumbers;
private long maxNumberGenerated;
private HashMap<Integer, PrintWriter> digitMap;
private final long maxWaitTime;
private final TimeUnit timeUnit;
public Collector(List<Future<Long>> primeNumbers, long maxNumberGenerated) {
this.primeNumbers = primeNumbers;
this.maxNumberGenerated = maxNumberGenerated;
this.digitMap = new HashMap<>();
this.maxWaitTime = 1000;
this.timeUnit = TimeUnit.MILLISECONDS;
}
public void start() {
try {
//create Files
int filesToCreate = getDigits(maxNumberGenerated);
for (int i = 1; i <= filesToCreate; i++) {
File f = new File(System.getProperty("user.dir") + "/src/solutionWithExecutor/PrimeNumsWith_" + i +
"_Digits.txt");
PrintWriter pw = new PrintWriter(f, "UTF-8");
digitMap.put(i, pw);
}
for (Future<Long> future : primeNumbers) {
Object possibleNumber = future.get();
if (possibleNumber != null) {
long numberToTest = (long) possibleNumber;
int numOfDigits = getDigits(numberToTest);
PrintWriter correspondingFileWriter = digitMap.get(numOfDigits);
correspondingFileWriter.println(possibleNumber.toString());
correspondingFileWriter.flush();
}
}
for (PrintWriter fw : digitMap.values()) {
fw.close();
}
} catch (InterruptedException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
}
private int getDigits(long maxNumberGenerated) {
return String.valueOf(maxNumberGenerated).length();
}
}
What I would like to do instead: If the number to be checked in the task is not a prime, delete my future from the list/cancel it. As far as I know, only the Executor itself can cancel a Future.
To me this seems like an unnecessary optimization. The Future is there so that the task can return a value. Once the task figures out it's not a prime and returns null the "cost" to the program associated with the Future is negligible. There is nothing to "cancel". That task has completed and all that is left is the memory that allows the Future to pass back the null or the prime Long.
Since we are talking about learning, in many situations programmers worry too quickly about performance and we often spend time optimizing parts of our application which really aren't the problem. If I saw using some JVM monitor (maybe jconsole) that the application was running out of memory then I maybe would worry about the list of Futuress but otherwise I'd write clean and easily maintained code.
If you really are worried about the Future then don't save them in a list at all and just share a BlockingQueue<Long> between the prime checking tasks and the main thread. The prime checking jobs would add(...) to the queue and the main thread would take(). You should consider putting nulls in the list because otherwise you wouldn't know if the prime tasks were done unless you counted the results. You'd want to check X random numbers and then you'll know that it was done when X results (null or numbers) get taken from the BlockingQueue<Long>.
Hope this helps.
Related
I am doing a sample program with wait() and notify(), but when notify() is called, more than one thread is wakes up instead of one.
The code is:
public class MyQueue<T> {
Object[] entryArr;
private volatile int addIndex;
private volatile int pending = -1;
private final Object lock = new Object();
private volatile long notifiedThreadId;
private int capacity;
public MyQueue(int capacity) {
entryArr = new Object[capacity];
this.capacity = capacity;
}
public void add(T t) {
synchronized (lock) {
if (pending >= 0) {
try {
pending++;
lock.wait();
System.out.println(notifiedThreadId + ":" + Thread.currentThread().getId());
} catch (InterruptedException e) {
e.printStackTrace();
}
} else if (pending == -1) {
pending++;
}
}
if (addIndex == capacity) { // its ok to replace existing value
addIndex = 0;
}
try {
entryArr[addIndex] = t;
} catch (ArrayIndexOutOfBoundsException e) {
System.out.println("ARRAYException:" + Thread.currentThread().getId() + ":" + pending + ":" + addIndex);
e.printStackTrace();
}
addIndex++;
synchronized (lock) {
if (pending > 0) {
pending--;
notifiedThreadId = Thread.currentThread().getId();
lock.notify();
} else if (pending == 0) {
pending--;
}
}
}
}
public class TestMyQueue {
public static void main(String args[]) {
final MyQueue<String> queue = new MyQueue<>(2);
for (int i = 0; i < 200; i++) {
Runnable r = new Runnable() {
#Override
public void run() {
for (int i = 0; i < Integer.MAX_VALUE; i++) {
queue.add(Thread.currentThread().getName() + ":" + i);
}
}
};
Thread t = new Thread(r);
t.start();
}
}
}
After some time, I see two threads being wake up by single thread. The output looks like:
91:114
114:124
124:198
198:106
106:202
202:121
121:40
40:42
42:83
83:81
81:17
17:189
189:73
73:66
66:95
95:199
199:68
68:201
201:70
70:110
110:204
204:171
171:87
87:64
64:205
205:115
Here I see 115 thread notified two threads, and 84 thread notified two threads; because of this we are seeing the ArrayIndexOutOfBoundsException.
115:84
115:111
84:203
84:200
ARRAYException:200:199:3
ARRAYException:203:199:3
What is the issue in the program?
What is the issue in the program?
You have a couple of problems with your code that may be causing this behavior. First, as #Holder commented on, there are a lot of code segments that can be run by multiple threads simultaneously that should be protected using synchronized blocks.
For example:
if (addIndex == capacity) {
addIndex = 0;
}
If multiple threads run this then multiple threads might see addIndex == capacity and multiple would be overwriting the 0th index. Another example is:
addIndex++;
This is a classic race condition if 2 threads try to execute this statement at the same time. If addIndex was 0 beforehand, after the 2 threads execute this statement, the value of addIndex might be 1 or 2 depending on the race conditions.
Any statements that could be executed at the same time by multiple threads have to be properly locked within a synchronized block or otherwise protected. Even though you have volatile fields, there can still be race conditions because there are multiple operations being executed.
Also, a classic mistake is to use if statements when checking for over or under flows on your array. They should be while statements to make sure you don't have the class consumer producer race conditions. See my docs here or take a look at the associated SO question: Why does java.util.concurrent.ArrayBlockingQueue use 'while' loops instead of 'if' around calls to await()?
I am working on a classification problem and I have implemented a grid search algorithm in order to find the best accuracy. My problem is that the program's execution time is about 2 hours and I have tried to improve this time by using threads. Obviously something I'm doing wrong since the execution time was the same even after implementing the threads. Bellow is the algorithm.
I must specify that is the first time I am using threads, I have read some good things about Executors, but I can't figure out how to implement them.
public static void gridSearch(Dataset ds)
{
double bestAcc = 0;
for (int i = -5; i < 15; i++) {
double param1 = Math.pow(2, i);
for (int j = -15; j < 3; j++) {
double param2 = Math.pow(2, j);
int size = 10;
CrossValidation[] works = new CrossValidation[size];
Thread[] threads = new Thread[size];
for (int k=1;k<=size;k++) {
CrossValidation po = new CrossValidation(param1, param2, ds);;
works[k-1] = po;
Thread t = new Thread(po);
threads[k-1] = t;
t.start();
}
for (int k = 0; k < size; k++) {
try { threads[k].join(); } catch (InterruptedException ex) {}
double accuracy = works[k].getAccuracy();
accuracy /= 106;
if (accuracy > bestAccuracy)
bestAcc = accuracy;
}
}
}
System.out.println("Best accuracy: " + bestAcc);
}
The CrossValidation class implements Runnable and has a method getAccuracy that returns the accuracy.
Please help me figure it out what I am doing wrong, in order to improve the execution time.
Your problem seems to be that you start for each parameter setting 10 threads instead of starting a thread for each parameter setting. Look closely what you're doing here. You're generating param1 and param2 and then start 10 threads that work with those parameters - redundantly. After that you are waiting for those threads to finish before you start over again.
But no worries, I have prepared something for you ...
I want to show you how you could make a Thread Pool do what you actually want to achieve here. It will be easier to understand once you get it running and note that:
You can download the whole example here.
First you need a WorkerThread and something like CVResult to return the results. This is where you are going to perform the CrossValidation algorithm:
public static class CVResult {
public double param1;
public double param2;
public double accuracy;
}
public static class WorkerThread implements Runnable {
private double param1;
private double param2;
private double accuracy;
public WorkerThread(double param1, double param2){
this.param1 = param1;
this.param2 = param2;
}
#Override
public void run() {
System.out.println(Thread.currentThread().getName() +
" [parameter1] " + param1 + " [parameter2]: " + param2);
processCommand();
}
private void processCommand() {
try {
Thread.sleep(500);
;
/*
* ### PERFORM YOUR CROSSVALIDATION ALGORITHM HERE ###
*/
this.accuracy = this.param1 + this.param2;
// Give back result:
CVResult result = new CVResult();
result.accuracy = this.accuracy;
result.param1 = this.param1;
result.param2 = this.param2;
Main.addResult(result);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
You also need to assure you have access to a ExecutorService and List<Future>. ExecutorService will take care of your threads and we will initialize the number of threads to be the number of cores that your CPU has available. This will ensure that no more threads are running than cores are available on your CPU - however - no task gets lost because each thread gets enqueued and starts after another has finished. You'll see that soon. List<Future> will allow us to wait for all threads to finish before we continue with the main thread. List<CVResult> is of course there to hold the results added by the threads (note that it is synchronized since multiple threads are going to access it).
private static ExecutorService executor = null;
private static List<Future> futures = new ArrayList<>();
private static List<CVResult> resultList = Collections.synchronizedList(new ArrayList<CVResult>());
This is how your gridSearch() would look like. You don't have to initialize executor here.. you can do that wherever you want of course:
public static void gridSearch(/*Dataset ds*/)
{
double bestAcc = 0;
int cores = Runtime.getRuntime().availableProcessors();
executor = Executors.newFixedThreadPool(cores);
for (int i = -5; i < 15; i++) {
double param1 = Math.pow(2, i);
for (int j = -15; j < 3; j++) {
double param2 = Math.pow(2, j);
Runnable worker = new WorkerThread(param1, param2);
futures.add(executor.submit(worker));
}
}
System.out.println("Waiting for all threads to terminate ..");
// Joining all threads in order to wait for all to finish
// before returning from gridSearch()
for (Future future: futures) {
try {
future.get(100, TimeUnit.SECONDS);
} catch (Throwable cause) {
// process cause
}
}
System.out.println("Printing results ..");
for(CVResult result : resultList) {
System.out.println("Acc: " + result.accuracy +
" for param1: " + result.param1 +
" | param2: " + result.param2);
}
}
Last but not least here is a synchronized method to add your results to the list:
public static void addResult(CVResult accuracy) {
synchronized( resultList ) {
resultList.add(accuracy);
}
}
If you call this in your main e.g. like this:
public static void main(String[] args) {
gridSearch(/* params */);
System.out.println("All done.");
}
You'll get an output like this:
...
pool-1-thread-5 [parameter1] 0.0625 [parameter2]: 3.0517578125E-5
param1 0.03125
param2 1.0
pool-1-thread-4 [parameter1] 0.0625 [parameter2]: 0.25
param1 0.0625
param2 0.03125
...
Printing results ..
...
Acc: 16384.5 for param1: 16384.0 | param2: 0.5
Acc: 16386.0 for param1: 16384.0 | param2: 2.0
...
All done.
Possibly because thread creation/teardown overhead is increasing the time needed to run the threads, fix this by using Executors. This will help you get started. As commented already, your processor may also not have the available processing threads or physical cores to execute your threads concurrently.
More prominently, between each of the -15 to 3 iterations, you must wait. To fix this, move your waiting and processing to the end of the for loop, once everything is processed. That way, the last 10 threads do not need to completely before starting the next batch. Additionally, I recommend using a CountDownLatch to await full completion before processing the results.
I'm trying to implement a concurrent cache in java for learning propose.
This code is responsable for garantee thread-safy operations. So, whenever a thread try to fetch a value, if this value is not already cached, the algorithm should calculate it from the last cached one.
My problem is that i'm getting null values that are supposed to be already cached. I'm using semaphore (though i've tried with ReentrantLock too, so i think it's not the problem) to assure the thread-safety access to an HashMap.
Note that i would like to restrict the locked area to the smallest possible. So i would not like to synchronize the entire method or utilize an already thread safe ConcurrentMap.
Here is a complete simple code:
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.Semaphore;
public class ConcurrentCache {
private final Semaphore semaphore = new Semaphore(1);
private final Map<Integer, Integer> cache;
private int lastCachedNumber;
public ConcurrentCache() {
cache = new HashMap<Integer, Integer>();
cache.put(0, 0);
lastCachedNumber = 0;
}
public Integer fetchAndCache(int n) {
//if it's already cached, supposedly i can access it in an unlocked way
if (n <= lastCachedNumber)
return cache.get(n);
lock();
Integer number;
if (n < lastCachedNumber) { // check it again. it may be updated by another thread
number = cache.get(n);
} else {
//fetch a previous calculated number.
number = cache.get(lastCachedNumber);
if (number == null)
throw new IllegalStateException(String.format(
"this should be cached. n=%d, lastCachedNumber=%d", n,
lastCachedNumber));
for (int i = lastCachedNumber + 1; i <= n; i++) {
number = number + 1;
cache.put(i, number);
lastCachedNumber = i;
}
}
unlock();
return number;
}
private void lock() {
try {
semaphore.acquire();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
private void unlock() {
semaphore.release();
}
public static void main(String[] args) {
ConcurrentCache cachedObject = new ConcurrentCache();
for (int nThreads = 0; nThreads < 5; nThreads++) {
new Thread(new Runnable() {
#Override
public void run() {
for (int cacheValue = 0; cacheValue < 1000; cacheValue++) {
if (cachedObject.fetchAndCache(cacheValue) == null) {
throw new IllegalStateException(String.format(
"the number %d should be cached",
cacheValue));
}
}
}
}).start();
}
}
}
Thank you for you help.
Few pointers/ideas:
1) pre-size your Map when you create it to accommodate all/many of your future cached values, Map resizing is very thread unsafe and time consuming
2) you can simplify your whole algorithm to
YourClass.get(int i) {
if (!entryExists(i)) {
lockEntry(i);
entry = createEntry(i);
putEntryInCache(i, entry);
unlockEntry(i);
}
return entry;
}
Edit
Another point:
3) your approach to caching is very bad - imagine what will happen if the 1st request is to get something # position 1,000,000?
Pre-populate in separate thread is going to be a lot better...
So, I ran a test and the results make no sense to me. Lets consider the following code:
ThreadStuffCounter counter_1 = new ThreadStuffCounter(1);
while(counter_1.doProceed) {
Thread.sleep(500);
Thread thread = new Thread(counter_1);
thread.start();
}
With the Runnable as follows:
package test;
public class ThreadStuffCounter implements Runnable {
public volatile boolean doProceed = true;
private int id = -1;
public volatile int i = -1;
public ThreadStuffCounter(int id) {
this.id = id;
}
#Override
public void run() {
for (i = 0; i < 10; i++) {
System.out.println("i = " + i + " in runnable id = " + id);
try {
Thread.sleep(1000);
}
catch (InterruptedException e) {
e.printStackTrace();
}
}
doProceed = false;
}
}
Only one instance of counter is shared between threads. It takes less time for another thread to start then even one increment to be made on the counter.doProceed should, as I understand never be set to false and the loop should continue indefinitely until I get an out of memory exception and cannot start any more threads.
How is it possible for the loop to exit?
EDIT: Modified code to make sure the answer below is correct.
package test;
public class ThreadStuffCounter implements Runnable{
public volatile boolean doProceed = true;
private int id = -1;
volatile int i = -1;
public ThreadStuffCounter(int id){
this.id = id;
}
#Override
public void run() {
i = 0;
while (i < 10){
System.out.println("i = " + i + " in runnable id = " + id +
"; from thead id = " + Thread.currentThread().getId());
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
i++;
}
ThreadStuff.doProceed = false;
}
}
And
package test;
public class ThreadStuff {
public static volatile boolean doProceed = true;
public static void main (String[] args) throws InterruptedException{
ThreadStuffCounter counter_1 = new ThreadStuffCounter(1);
while(doProceed){
Thread.sleep(500);
Thread thread = new Thread(counter_1);
thread.start();
}
}
}
Also, it appears more then n threads are needed if you are running for i < n. You need however many, so that n threads increment at the same time.
When at least one of the threads executes the for loop and i value is greater or equal than 10, then doProceed variable will be false (yes, this may happen), and since it's volatile this will stop the execution of the while loop that creates and starts new threads. Then, is up to all the threads to just finish executing the code of the for loop and then finishing their execution. This seems to happen because the time to start a new thread in your environment is slower than the time for a current thread to finish its execution. Also, note that several threads may increase i value, which will accelerate the for loop execution.
Probably if you loop to a higher value (not tested) then this could generate an infinite loop and the application will break when there aren't enough resources to create and start new threads.
After some tests using the limit as 10, 50 and 1000. I noticed that when you have a bigger value, since lots of threads are created, all of them increase the value of i at the same time and i slowly starts to get closer to the limit value set in the for loop. Description of my current environment:
OS: Windows 7 Professional 64 bits
Processor: Intel(R) Core(TM) i5-2520M CPU # 2.50GHz (4 CPUs), ~2.5GHz
Ram: 8192MB
I have some thread-related questions, assuming the following code. Please ignore the possible inefficiency of the code, I'm only interested in the thread part.
//code without thread use
public static int getNextPrime(int from) {
int nextPrime = from+1;
boolean superPrime = false;
while(!superPrime) {
boolean prime = true;
for(int i = 2;i < nextPrime;i++) {
if(nextPrime % i == 0) {
prime = false;
}
}
if(prime) {
superPrime = true;
} else {
nextPrime++;
}
}
return nextPrime;
}
public static void main(String[] args) {
int primeStart = 5;
ArrayList list = new ArrayList();
for(int i = 0;i < 10000;i++) {
list.add(primeStart);
primeStart = getNextPrime(primeStart);
}
}
If I'm running the code like this and it takes about 56 seconds. If, however, I have the following code (as an alternative):
public class PrimeRunnable implements Runnable {
private int from;
private int lastPrime;
public PrimeRunnable(int from) {
this.from = from;
}
public boolean isPrime(int number) {
for(int i = 2;i < from;i++) {
if((number % i) == 0) {
return false;
}
}
lastPrime = number;
return true;
}
public int getLastPrime() {
return lastPrime;
}
public void run() {
while(!isPrime(++from))
;
}
}
public static void main(String[] args) {
int primeStart = 5;
ArrayList list = new ArrayList();
for(int i = 0;i < 10000;i++) {
PrimeRunnable pr = new PrimeRunnable(primeStart);
Thread t = new Thread(pr);
t.start();
t.join();
primeStart = pr.getLastPrime();
list.add(primeStart);
}
}
The whole operation takes about 7 seconds. I am almost certain that even though I only create one thread at a time, a thread doesn't always finish when another is created. Is that right? I am also curious: why is the operation ending so fast?
When I'm joining a thread, do other threads keep running in the background, or is the joined thread the only one that's running?
By putting the join() in the loop, you're starting a thread, then waiting for that thread to stop before running the next one. I think you probably want something more like this:
public static void main(String[] args) {
int primeStart = 5;
// Make thread-safe list for adding results to
List list = Collections.synchronizedList(new ArrayList());
// Pull thread pool count out into a value so you can easily change it
int threadCount = 10000;
Thread[] threads = new Thread[threadCount];
// Start all threads
for(int i = 0;i < threadCount;i++) {
// Pass list to each Runnable here
// Also, I added +i here as I think the intention is
// to test 10000 possible numbers>5 for primeness -
// was testing 5 in all loops
PrimeRunnable pr = new PrimeRunnable(primeStart+i, list);
Thread[i] threads = new Thread(pr);
threads[i].start(); // thread is now running in parallel
}
// All threads now running in parallel
// Then wait for all threads to complete
for(int i=0; i<threadCount; i++) {
threads[i].join();
}
}
By the way pr.getLastPrime() will return 0 in the case of no prime, so you might want to filter that out before adding it to your list. The PrimeRunnable has to absorb the work of adding to the final results list. Also, I think PrimeRunnable was actually broken by still having incrementing code in it. I think this is fixed, but I'm not actually compiling this.
public class PrimeRunnable implements Runnable {
private int from;
private List results; // shared but thread-safe
public PrimeRunnable(int from, List results) {
this.from = from;
this.results = results;
}
public void isPrime(int number) {
for(int i = 2;i < from;i++) {
if((number % i) == 0) {
return;
}
}
// found prime, add to shared results
this.results.add(number);
}
public void run() {
isPrime(from); // don't increment, just check one number
}
}
Running 10000 threads in parallel is not a good idea. It's a much better idea to create a reasonably sized fixed thread pool and have them pull work from a shared queue. Basically every worker pulls tasks from the same queue, works on them and saves the results somewhere. The closest port of this with Java 5+ is to use an ExecutorService backed by a thread pool. You could also use a CompletionService which combines an ExecutorService with a result queue.
An ExecutorService version would look like:
public static void main(String[] args) {
int primeStart = 5;
// Make thread-safe list for adding results to
List list = Collections.synchronizedList(new ArrayList());
int threadCount = 16; // Experiment with this to find best on your machine
ExecutorService exec = Executors.newFixedThreadPool(threadCount);
int workCount = 10000; // See how # of work is now separate from # of threads?
for(int i = 0;i < workCount;i++) {
// submit work to the svc for execution across the thread pool
exec.execute(new PrimeRunnable(primeStart+i, list));
}
// Wait for all tasks to be done or timeout to go off
exec.awaitTermination(1, TimeUnit.DAYS);
}
Hope that gave you some ideas. And I hope the last example seemed a lot better than the first.
You can test this better by making the exact code in your first example run with threads. Sub your main method with this:
private static int currentPrime;
public static void main(String[] args) throws InterruptedException {
for (currentPrime = 0; currentPrime < 10000; currentPrime++) {
Thread t = new Thread(new Runnable() {
public void run() {
getNextPrime(currentPrime);
}});
t.run();
t.join();
}
}
This will run in the same time as the original.
To answer your "join" question: yes, other threads can be running in the background when you use "join", but in this particular case you will only have one active thread at a time, because you are blocking the creation of new threads until the last thread is done executing.
JesperE is right, but I don't believe in only giving hints (at least outside a classroom):
Note this loop in the non-threaded version:
for(int i = 2;i < nextPrime;i++) {
if(nextPrime % i == 0) {
prime = false;
}
}
As opposed to this in the threaded version:
for(int i = 2;i < from;i++) {
if((number % i) == 0) {
return false;
}
}
The first loop will always run completely through, while the second will exit early if it finds a divisor.
You could make the first loop also exit early by adding a break statement like this:
for(int i = 2;i < nextPrime;i++) {
if(nextPrime % i == 0) {
prime = false;
break;
}
}
Read your code carefully. The two cases aren't doing the same thing, and it has nothing to do with threads.
When you join a thread, other threads will run in the background, yes.
Running a test, the second one doesn't seem to take 9 seconds--in fact, it takes at least as long as the first (which is to be expected, threding can't help the way it's implemented in your example.
Thread.join will only return when the thread.joined terminates, then the current thread will continue, the one you called join on will be dead.
For a quick reference--think threading when starting one iteration does not depend on the result of the previous one.