Cleaning up threads in Java - java

I have a Java method that performs two computations over an input set: an estimated and an accurate answer. The estimate can always be computed cheaply and in reliable time. The accurate answer can sometimes be computed in acceptable time and sometimes not (not known a priori ... have to try and see).
What I want to set up is some framework where if the accurate answer takes too long (a fixed timeout), the pre-computed estimate is used instead. I figured I'd use a thread for this. The main complication is that the code for computing the accurate answer relies on an external library, and hence I cannot "inject" Interrupt support.
A standalone test-case for this problem is here, demonstrating my problem:
package test;
import java.util.Random;
public class InterruptableProcess {
public static final int TIMEOUT = 1000;
public static void main(String[] args){
for(int i=0; i<10; i++){
getAnswer();
}
}
public static double getAnswer(){
long b4 = System.currentTimeMillis();
// have an estimate pre-computed
double estimate = Math.random();
//try to get accurate answer
//can take a long time
//if longer than TIMEOUT, use estimate instead
AccurateAnswerThread t = new AccurateAnswerThread();
t.start();
try{
t.join(TIMEOUT);
} catch(InterruptedException ie){
;
}
if(!t.isFinished()){
System.err.println("Returning estimate: "+estimate+" in "+(System.currentTimeMillis()-b4)+" ms");
return estimate;
} else{
System.err.println("Returning accurate answer: "+t.getAccurateAnswer()+" in "+(System.currentTimeMillis()-b4)+" ms");
return t.getAccurateAnswer();
}
}
public static class AccurateAnswerThread extends Thread{
private boolean finished = false;
private double answer = -1;
public void run(){
//call to external, non-modifiable code
answer = accurateAnswer();
finished = true;
}
public boolean isFinished(){
return finished;
}
public double getAccurateAnswer(){
return answer;
}
// not modifiable, emulate an expensive call
// in practice, from an external library
private double accurateAnswer(){
Random r = new Random();
long b4 = System.currentTimeMillis();
long wait = r.nextInt(TIMEOUT*2);
//don't want to use .wait() since
//external code doesn't support interruption
while(b4+wait>System.currentTimeMillis()){
;
}
return Math.random();
}
}
}
This works fine outputting ...
Returning estimate: 0.21007465651836377 in 1002 ms
Returning estimate: 0.5303547292361411 in 1001 ms
Returning accurate answer: 0.008838428149438915 in 355 ms
Returning estimate: 0.7981717302567681 in 1001 ms
Returning estimate: 0.9207406241557682 in 1000 ms
Returning accurate answer: 0.0893839926072787 in 175 ms
Returning estimate: 0.7310211480220586 in 1000 ms
Returning accurate answer: 0.7296754467596422 in 530 ms
Returning estimate: 0.5880164300851529 in 1000 ms
Returning estimate: 0.38605296260291233 in 1000 ms
However, I have a very large input set (in the order of billions of items) to run my analysis over, and I'm uncertain as to how to clean up the threads that do not finish (I do not want them running in the background).
I know that various methods to destroy threads are deprecated with good reason. I also know that the typical way to stop a thread is to use interrupts. However, in this case, I don't see that I can use an interrupt since the run() method passes a single call to an external library.
How can I kill/clean-up threads in this case?

If you know enough about the external library, such as:
never acquires any locks;
never opens any files/network connections;
never involves any I/O whatsoever, not even logging;
then it may be safe to use Thread#stop on it. You could try it and do extensive stress testing. Any resource leaks should manifest themselves soon enough.

I'd try it to see if it will respond to an Thread.interrupt(). Reduce your data of course so it doesn't run forever, but if it responds to an interrupt() then you're home free. If they lock anything, perform a wait(), or sleep() the code will have to handle the InterruptedException and it's possible the author did what was right. They may swallow it and continue, but it's possible they didn't.
While technically you can call Thread.stop() you'll need to know everything about that code to know for sure if it's safe and you won't leak resources. However, doing that research will clue you into how you could easily modify the code to look for interrupt() as well. You'll pretty much have to have the source code to audit it to know for sure which means you could easily do the right thing and add the checks there without involving as much research to know if its safe to call Thread.stop().
The other option is to cause a RuntimeException in the thread. Try nulling a reference it might have or closing some IO (socket, file handle, etc). Modify the array of data it's walking over by changing the size or null out the data. There's something you can do to cause it to throw an exception and that is not handled and it will shutdown.

Extending on the answer by chubbsondubs, if the third-party library uses some well-defined API (such as java.util.List or some library-specific API) to access the input data set, you could wrap the input data set that you pass to the third-party code with a wrapper class that will throw exceptions, e.g. in the List.get method, after a cancel flag is set.
For instance, if you pass a List to your third-party library, then it might be possible to do something along the lines of:
class CancelList<T> implements List<T> {
private final List<T> wrappedList;
private volatile boolean canceled = false;
public CancelList(List<T> wrapped) { this.wrappedList = wrapped; }
public void cancel() { this.canceled = true; }
public T get(int index) {
if (canceled) { throw new RuntimeException("Canceled!"); }
return wrappedList.get(index);
}
// Other List method implementations here...
}
public double getAnswer(List<MyType> inputList) {
CancelList<MyType> cancelList = new CancelList<MyType>(inputList);
AccurateAnswerThread t = new AccurateAnswerThread(cancelList);
t.start();
try{
t.join(TIMEOUT);
} catch(InterruptedException ie){
cancelList.cancel();
}
// Get the result of your calculation here...
}
Of course, this approach depends on a few things:
You must know the third-party code well-enough to know what methods it calls that you can control through input parameters.
The third-party code would need to make frequent calls to these methods throughout the computation process (i.e. it won't work if it copies all the data at once into an internal structure and does its computation there).
Obviously this won't work if the library catches and handles runtime exceptions and continues processing.

Related

Java "in-order" semaphore

I have an issue and a vague idea for how to fix it, but I'll try to overshare on the context to avoid an XY problem.
I have an asychronous method that immediately returns a guava ListenableFuture that I need to call hundreds of thousands or millions of times (the future itself can take a little while to complete). I can't really change the internals of that method. There's some serious resource contention involved in it internally, so I'd like to limit the number of calls to that method that happen at once. So I tried using a Semaphore:
public class ConcurrentCallsLimiter<In, Out> {
private final Function<In, ListenableFuture<Out>> fn;
private final Semaphore semaphore;
private final Executor releasingExecutor;
public ConcurrentCallsLimiter(
int limit, Executor releasingExecutor,
Function<In, ListenableFuture<Out>> fn) {
this.semaphore = new Semaphore(limit);
this.fn = fn;
this.releasingExecutor = releasingExecutor;
}
public ListenableFuture<Out> apply(In in) throws InterruptedException {
semaphore.acquire();
ListenableFuture<Out> result = fn.apply(in);
result.addListener(() -> semaphore.release(), releasingExecutor);
return result;
}
}
So then I can just wrap my call in this class and call that instead:
ConcurrentLimiter<Foo, Bar> cl =
new ConcurrentLimiter(10, executor, someService::turnFooIntoBar);
for (Foo foo : foos) {
ListenableFuture<Bar> bar = cl.apply(foo);
// handle bar (no pun intended)
}
This sort of works. The issue is that the tail latency is really bad. Some calls get "unlucky" and end up taking a long time trying to acquire resources within that method call. This is exacerbated by some internal exponential-backoff logic such that the unlucky call gets less and less of a chance to acquire the needed resources, compared to new calls that are more eager and wait a much shorter time before trying again.
What would be ideal to fix it is if there was something similar to that semaphore but that had a notion of order. For instance if the limit is 10, then the 11th call currently must wait for any of the first 10 to complete. What I'd like is for the 11th call to have to wait for the very first call to complete. That way "unlucky" calls aren't continuing to be starved by new calls constantly coming in.
It seems like I could assign an integer sequence number to the calls and somehow keep track of the lowest one yet to finish, but couldn't quite see how to make that work, especially since there's not really any useful "waiting" methods on AtomicInteger or whatever.
You can create semaphore with fairness paramter:
Semaphore(int permits, boolean fair)
//fair - true if this semaphore will guarantee first-in first-out granting of permits under contention, else false
One less-than-ideal solution I thought of was using a Queue to store all of the futures:
public class ConcurrentCallsLimiter<In, Out> {
private final Function<In, ListenableFuture<Out>> fn;
private final int limit;
private final Queue<ListenableFuture<Out>> queue = new ArrayDeque<>();
public ConcurrentCallsLimiter(int limit, Function<In, ListenableFuture<Out>> fn) {
this.limit = limit;
this.fn = fn;
}
public ListenableFuture<Out> apply(In in) throws InterruptedException {
if (queue.size() == limit) {
queue.remove().get();
}
ListenableFuture<Out> result = fn.apply(in);
queue.add(result);
return result;
}
}
However, this seems like a big waste of memory. The objects involved could be a bit big and the limit could be set pretty high. So I'm open to better answers that don't have O(n) memory usage. It seems like it should be doable in O(1).
"if there was something similar to that semaphore but that had a notion of order."
There exists such a beast. It is called Blocking queue. Create such a queue, put 10 items there, and use take instead of acquire and put instead of release.

Know which thread goes to which processor in a Java ForkJoinPool in Apache Spark?

Goal: To know, as I fork off a thread, which processor it's going to land on. Is that possible? Regardless of whether the underlying approach is valid, is there a good answer to that narrow question? Thanks.
(Right now I need to make a copy of one of our classes for each thread, write to it in that thread and merge them all later. Using a synchronized approach is not possible because my Java expert boss thinks it's a bad idea, and after a lot of discussion I agree. If I knew which processor each thread would land on, I would only need to make as many copies of that class as there are processors.)
We use Apache Spark to get our jobs spread across a cluster, but in our application is makes sense to run one big executor and then do some multi-threading of our own out on each machine in the cluster.
I could save a lot of deep copying if I could know which processor a thread is being sent to, is that possible? I threw in our code but it's probably more of a conceptual question:
When I get down to the "do task" part of compute(), can I know which processor it's running on?
public class TholdExecutor extends RecursiveTask<TholdDropEvaluation> {
final static Logger logger = LoggerFactory.getLogger(TholdExecutor.class);
private List<TholdDropResult> partitionOfN = new ArrayList<>();
private int coreCount;
private int desiredPartitionSize; // will be updated by whatever is passed into the constructor per-chromosome
private TholdDropEvaluation localDropEvaluation; // this DropEvaluation
private TholdDropResult mSubI_DR;
public TholdExecutor(List<TholdDropResult> subsetOfN, int cores, int partSize, TholdDropEvaluation passedDropEvaluation, TholdDropResult mDrCopy) {
partitionOfN = subsetOfN;
coreCount = cores;
desiredPartitionSize = partSize;
// the TholdDropEvaluation needs to be a copy for each thread? It can't be the same one passed to threads ... so ...
TholdDropEvaluation localDropEvaluation = makeDECopy(passedDropEvaluation); // THIS NEEDS TO BE A DEEP COPY OF THE DROP EVAL!!! NOT THE ORIGINAL!!
// we never modify the TholdDropResult that is passed in, we just need to read it all on the same JVM/worker, so
mSubI_DR = mDrCopy; // this is purely a reference and can point to the passed in value (by reference, right?)
}
// this makes a deep copy of the TholdDropEvaluation for each thread, we copy the SharingRun's startIndex and endIndex only,
// as LEG events will be calculated during the subsequent dropComparison. The constructor for TholdDropEvaluation must set
// LEG events to zero.
private void makeDECopy(TholdDropEvaluation passedDropEvaluation) {
TholdDropEvaluation tholdDropEvaluation = new TholdDropEvaluation();
// iterate through the SharingRuns in the SharingRunList from the TholdDropEval that was passed in
for (SharingRun sr : passedDropEvaluation.getSharingRunList()) {
SharingRun ourSharingRun = new SharingRun();
ourSharingRun.startIndex = sr.startIndex;
ourSharingRun.endIndex = sr.endIndex;
tholdDropEvaluation.addSharingRun(ourSharingRun);
}
return tholdDropEvaluation
}
#Override
protected TholdDropEvaluation compute() {
int simsToDo = partitionOfN.size();
UUID tag = UUID.randomUUID();
long computeStartTime = System.nanoTime();
if (simsToDo <= desiredPartitionSize) {
logger.debug("IN MULTI-THREAD compute() --- UUID {}:Evaluating partitionOfN sublist length", tag, simsToDo);
// job within size limit, do the task and return the completed TholdDropEvaluation
// iterate through each TholdDropResult in the sub-partition and do the dropComparison to the refernce mSubI_DR,
// writing to the copy of the DropEval in tholdDropEvaluation
for (TholdDropResult currentResult : partitionOfN) {
mSubI_DR.dropComparison(currentResult, localDropEvaluation);
}
} else {
// job too large, subdivide and call this recursively
int half = simsToDo / 2;
logger.info("Splitting UUID = {}, half is {} and simsToDo is {}", tag, half, simsToDo );
TholdExecutor nextExec = new TholdExecutor(partitionOfN.subList(0, half), coreCount, desiredPartitionSize, tholdDropEvaluation, mSubI_DR);
TholdExecutor futureExec = new TholdExecutor(partitionOfN.subList(half, simsToDo), coreCount, desiredPartitionSize, tholdDropEvaluation, mSubI_DR);
nextExec.fork();
TholdDropEvaluation futureEval = futureExec.compute();
TholdDropEvaluation nextEval = nextExec.join();
tholdDropEvaluation.merge(futureEval);
tholdDropEvaluation.merge(nextEval);
}
logger.info("{} Compute time is {} ns",tag, System.nanoTime() - computeStartTime);
// NOTE: this was inside the else block in Rob's example, but don't we want it outside the block so it's returned
// whether
return tholdDropEvaluation;
}
}
Even if you could figure out where a thread would run initially there's no reason to assume it would live on that processor/core for the rest of its life. In all probability for any task big enough to be worth the cost of spawning a thread it won't, so you'd need to control where it ran completely to offer that level of assurance.
As far as I know there's no standard mechanism for controlling mappings from threads to processor cores inside Java. Typically that's known as "thread affinity" or "processor affinity". On Windows and Linux for example you can control that using:
Windows: SetThreadAffinityMask
Linux: sched_setaffinity or pthread_setaffinity_np
so in theory you could write some C and JNI code that allowed you to abstract this enough on the Java hosts you cared about to make it work.
That feels like the wrong solution to the real problem you seem to be facing, because you end up withdrawing options from the OS scheduler, which potentially doesn't allow it to make the smartest scheduling decisions causing total runtime to increase. Unless you're pushing an unusual workload and modelling/querying processor information/topology down to the level of NUMA and shared caches it ought to do a better job of figuring out where to run threads for most workloads than you could. Your JVM typically runs a large number of additional threads besides just the ones you explicitly create from after main() gets called. Additionally I wouldn't like to promise anything about what the JVM you run today (or even tomorrow) might decide to do on its own about thread affinity.
Having said that it seems like the underlying problem is that you want to have one instance of an object per thread. Typically that's much easier than predicting where a thread will run and then manually figuring out a mapping between N processors and M threads at any point in time. Usually you'd use "thread local storage" (TLS) to solve this problem.
Most languages provide this concept in one form or another. In Java this is provided via the ThreadLocal class. There's an example in the linked document given:
public class ThreadId {
// Atomic integer containing the next thread ID to be assigned
private static final AtomicInteger nextId = new AtomicInteger(0);
// Thread local variable containing each thread's ID
private static final ThreadLocal<Integer> threadId =
new ThreadLocal<Integer>() {
#Override protected Integer initialValue() {
return nextId.getAndIncrement();
}
};
// Returns the current thread's unique ID, assigning it if necessary
public static int get() {
return threadId.get();
}
}
Essentially there are two things you care about:
When you call get() it returns the value (Object) belonging to the current thread
If you call get in a thread which currently has nothing it will call initialValue() you implement, which allows you to construct or obtain a new object.
So in your scenario you'd probably want to deep copy the initial version of some local state from a read-only global version.
One final point of note: if your goal is to divide and conquer; do some work on lots of threads and then merge all their results to one answer the merging part is often known as a reduction. In that case you might be looking for MapReduce which is probably the most well known form of parallelism using reductions.

What is a good parallel program [with Java Thread]?

I am learning Thread-ing in Java in order to create some program run in parallel. To design programs with parallelism is something I never had a chance to learn back at my school programming class. I know how to create threads and make them run, but I have no idea how to use them efficiently. After all I know it is not actually using threads that makes a program fast but a good parallel design. So I did some experiment to test my knowledge. However, my paralleled version actually runs slower than an unparalleled one. I start to doubt if I really get the idea. If you could be so kind, would you mind having a look my following program:
I made a program to fill an array in a divide-and-conquer fashion (I know Java has a Arrays.fill utility, but I just want to test my knowledge in multithreading):
public class ParalledFill
{
private static fill(final double [] array,
final double value,
final int start,
final int size)
{
if (size > 1000)
{ // Each thread handles at most 1000 elements
Runnable task = new Runnable() { // Fork the task
public void run() {
fill(array, value, start, 1000); // Fill the first 1000 elements
}};
// Create the thread
Thread fork = new Thread(task);
fork.start();
// Fill the rest of the array
fill(array, value, start+1000, size-1000);
// Join the task
try {
fork.join();
}
catch (InterruptedException except)
{
System.err.println(except);
}
}
else
{ // The array is small enough, fill it via a normal loop
for (int i = start; i < size; ++i)
array[i] = value;
}
} // fill
public static void main(String [] args)
{
double [] bigArray = new double[1000*1000];
double value = 3;
fill(bigArray, value, 0, bigArray.length);
}
}
I tested this program, but it turns out to be even slower than just doing something like:
for (int i = 0; i < bigArray.length; ++i)
bigArray[i] = value;
I had my guess, it could be that java does some optimisation for filling an array using a loop which makes it much faster than my threaded version. But other than that, I feel more strongly that my way to handle threads/parallelism could be wrong. I have never designed anything using threads (always relied on compiler optimisation or OpenMP in C). Could anyone help me explain why my paralleled version isn’t faster? Was the program just too bad in terms of designing paralleled program?
Thanks,
Xing.
Unless you have multiple CPUs, or long running tasks like I/O, I'm guessing that all you're doing is time slicing between threads. If there's a single CPU that has so much work to do, adding threads doesn't decrease the work that has to be done. All you end up doing is adding overhead due to context switching.
You ought to read "Java Concurrency In Practice". Better to learn how to do things with the modern concurrency package rather than raw threads.

How to use Multithreading to effectively

I want to do a task that I've already completed except this time using multithreading. I have to read a lot of data from a file (line by line), grab some information from each line, and then add it to a Map. The file is over a million lines long so I thought it may benefit from multithreading.
I'm not sure about my approach here since I have never used multithreading in Java before.
I want to have the main method do the reading, and then giving the line that has been read to another thread which will format a String, and then give it to another thread to put into a map.
public static void main(String[] args)
{
//Some information read from file
BufferedReader br = null;
String line = '';
try {
br = new BufferedReader(new FileReader("somefile.txt"));
while((line = br.readLine()) != null) {
// Pass line to another task
}
// Here I want to get a total from B, but I'm not sure how to go about doing that
}
public class Parser extends Thread
{
private Mapper m1;
// Some reference to B
public Parse (Mapper m) {
m1 = m;
}
public parse (String s, int i) {
// Do some work on S
key = DoSomethingWithString(s);
m1.add(key, i);
}
}
public class Mapper extends Thread
{
private SortedMap<String, Integer> sm;
private String key;
private int value;
boolean hasNewItem;
public Mapper() {
sm = new TreeMap<String, Integer>;
hasNewItem = false;
}
public void add(String s, int i) {
hasNewItem = true;
key = s;
value = i;
}
public void run() {
while (!Thread.currentThread().isInterrupted()) {
try {
if (hasNewItem) {
// Find if street name exists in map
sm.put(key, value);
newEntry = false;
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
// I'm not sure how to give the Map back to main.
}
}
I'm not sure if I am taking the right approach. I also do not know how to terminate the Mapper thread and retrieve the map in the main. I will have multiple Mapper threads but I have only instantiated one in the code above.
I also just realized that my Parse class is not a thread, but only another class if it does not override the run() method so I am thinking that the Parse class should be some sort of queue.
And ideas? Thanks.
EDIT:
Thanks for all of the replies. It seems that since I/O will be the major bottleneck there would be little efficiency benefit from parallelizing this. However, for demonstration purpose, am I going on the right track? I'm still a bit bothered by not knowing how to use multithreading.
Why do you need multiple threads? You only have one disk and it can only go so fast. Multithreading it won't help in this case, almost certainly. And if it does, it will be very minimal from a user's perspective. Multithreading isn't your problem. Reading from a huge file is your bottle neck.
Frequently I/O will take much longer than the in-memory tasks. We refer to such work as I/O-bound. Parallelism may have a marginal improvement at best, and can actually make things worse.
You certainly don't need a different thread to put something into a map. Unless your parsing is unusually expensive, you don't need a different thread for it either.
If you had other threads for these tasks, they might spend most of their time sitting around waiting for the next line to be read.
Even parallelizing the I/O won't necessarily help, and may hurt. Even if your CPUs support parallel threads, your hard drive might not support parallel reads.
EDIT:
All of us who commented on this assumed the task was probably I/O-bound -- because that's frequently true. However, from the comments below, this case turned out to be an exception. A better answer would have included the fourth comment below:
Measure the time it takes to read all the lines in the file without processing them. Compare to the time it takes to both read and process them. That will give you a loose upper bound on how much time you could save. This may be decreased by a new cost for thread synchronization.
You may wish to read Amdahl's Law. Since the majority of your work is strictly serial (the IO) you will get negligible improvements by multi-threading the remainder. Certainly not worth the cost of creating watertight multi-threaded code.
Perhaps you should look for a new toy-example to parallelise.

reduce in performance when used multithreading in java

I am new to multi-threading and I have to write a program using multiple threads to increase its efficiency. At my first attempt what I wrote produced just opposite results. Here is what I have written:
class ThreadImpl implements Callable<ArrayList<Integer>> {
//Bloom filter instance for one of the table
BloomFilter<Integer> bloomFilterInstance = null;
// Data member for complete data access.
ArrayList< ArrayList<UserBean> > data = null;
// Store the result of the testing
ArrayList<Integer> result = null;
int tableNo;
public ThreadImpl(BloomFilter<Integer> bloomFilterInstance,
ArrayList< ArrayList<UserBean> > data, int tableNo) {
this.bloomFilterInstance = bloomFilterInstance;
this.data = data;
result = new ArrayList<Integer>(this.data.size());
this.tableNo = tableNo;
}
public ArrayList<Integer> call() {
int[] tempResult = new int[this.data.size()];
for(int i=0; i<data.size() ;++i) {
tempResult[i] = 0;
}
ArrayList<UserBean> chkDataSet = null;
for(int i=0; i<this.data.size(); ++i) {
if(i==tableNo) {
//do nothing;
} else {
chkDataSet = new ArrayList<UserBean> (data.get(i));
for(UserBean toChk: chkDataSet) {
if(bloomFilterInstance.contains(toChk.getUserId())) {
++tempResult[i];
}
}
}
this.result.add(new Integer(tempResult[i]));
}
return result;
}
}
In the above class there are two data members data and bloomFilterInstance and they(the references) are passed from the main program. So actually there is only one instance of data and bloomFilterInstance and all the threads are accessing it simultaneously.
The class that launches the thread is(few irrelevant details have been left out, so all variables etc. you can assume them to be declared):
class MultithreadedVrsion {
public static void main(String[] args) {
if(args.length > 1) {
ExecutorService es = Executors.newFixedThreadPool(noOfTables);
List<Callable<ArrayList<Integer>>> threadedBloom = new ArrayList<Callable<ArrayList<Integer>>>(noOfTables);
for (int i=0; i<noOfTables; ++i) {
threadedBloom.add(new ThreadImpl(eval.bloomFilter.get(i),
eval.data, i));
}
try {
List<Future<ArrayList<Integer>>> answers = es.invokeAll(threadedBloom);
long endTime = System.currentTimeMillis();
System.out.println("using more than one thread for bloom filters: " + (endTime - startTime) + " milliseconds");
System.out.println("**Printing the results**");
for(Future<ArrayList<Integer>> element: answers) {
ArrayList<Integer> arrInt = element.get();
for(Integer i: arrInt) {
System.out.print(i.intValue());
System.out.print("\t");
}
System.out.println("");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
I did the profiling with jprofiler and
![here]:(http://tinypic.com/r/wh1v8p/6)
is a snapshot of cpu threads where red color shows blocked, green runnable and yellow is waiting. I problem is that threads are running one at a time I do not know why?
Note:I know that this is not thread safe but I know that I will only be doing read operations throughout now and just want to analyse raw performance gain that can be achieved, later I will implement a better version.
Can anyone please tell where I have missed
One possibility is that the cost of creating threads is swamping any possible performance gains from doing the computations in parallel. We can't really tell if this is a real possibility because you haven't included the relevant code in the question.
Another possibility is that you only have one processor / core available. Threads only run when there is a processor to run them. So your expectation of a linear speed with the number of threads and only possibly achieved (in theory) if is a free processor for each thread.
Finally, there could be memory contention due to the threads all attempting to access a shared array. If you had proper synchronization, that would potentially add further contention. (Note: I haven't tried to understand the algorithm to figure out if contention is likely in your example.)
My initial advice would be to profile your code, and see if that offers any insights.
And take a look at the way you are measuring performance to make sure that you aren't just seeing some benchmarking artefact; e.g. JVM warmup effects.
That process looks CPU bound. (no I/O, database calls, network calls, etc.) I can think of two explanations:
How many CPUs does your machine have? How many is Java allowed to use? - if the threads are competing for the same CPU, you've added coordination work and placed more demand on the same resource.
How long does the whole method take to run? For very short times, the additional work in context switching threads could overpower the actual work. The way to deal with this is to make a longer job. Also, run it a lot of times in a loop not counting the first few iterations (like a warm up, they aren't representative.)
Several possibilities come to mind:
There is some synchronization going on inside bloomFilterInstance's implementation (which is not given).
There is a lot of memory allocation going on, e.g., what appears to be an unnecessary copy of an ArrayList when chkDataSet is created, use of new Integer instead of Integer.valueOf. You may be running into overhead costs for memory allocation.
You may be CPU-bound (if bloomFilterInstance#contains is expensive) and threads are simply blocking for CPU instead of executing.
A profiler may help reveal the actual problem.

Categories