So I think I sort of understand how fixed thread pools work (using the Executor.fixedThreadPool built into Java), but from what I can see, there's usually a set number of jobs you want done and you know how many to when you start the program. For example
int numWorkers = Integer.parseInt(args[0]);
int threadPoolSize = Integer.parseInt(args[1]);
ExecutorService tpes =
Executors.newFixedThreadPool(threadPoolSize);
WorkerThread[] workers = new WorkerThread[numWorkers];
for (int i = 0; i < numWorkers; i++) {
workers[i] = new WorkerThread(i);
tpes.execute(workers[i]);
}
Where each workerThread does something really simple,that part is arbitrary. What I want to know is, what if you have a fixed pool size (say 8 max) but you don't know how many workers you'll need to finish the task until runtime.
The specific example is: If I have a pool size of 8 and I'm reading from standard input. As I read, I split the input into blocks of a set size. Each one of these blocks is given to a thread (along with some other information) so that they can compress it. As such, I don't know how many threads I'll need to create as I need to keep going until I reach the end of the input. I also have to somehow ensure that the data stays in the same order. If thread 2 finishes before thread 1 and just submits its work, my data will be out of order!
Would a thread pool be the wrong approach in this situation then? It seems like it'd be great (since I can't use more than 8 threads at a time).
Basically, I want to do something like this:
ExecutorService tpes = Executors.newFixedThreadPool(threadPoolSize);
BufferedInputStream inBytes = new BufferedInputStream(System.in);
byte[] buff = new byte[BLOCK_SIZE];
byte[] dict = new byte[DICT_SIZE];
WorkerThread worker;
int bytesRead = 0;
while((bytesRead = inBytes.read(buff)) != -1) {
System.arraycopy(buff, BLOCK_SIZE-DICT_SIZE, dict, 0, DICT_SIZE);
worker = new WorkerThread(buff, dict)
tpes.execute(worker);
}
This is not working code, I know, but I'm just trying to illustrate what I want.
I left out a bit, but see how buff and dict have changing values and that I don't know how long the input is. I don't think I can't actually do this thought because, well worker already exists after the first call! I can't just say worker = new WorkerThread a bunch of time since isn't it already pointing towards an existing thread (true, a thread that might be dead) and obviously in this implemenation if it did work I wouldn't be running in parallel. But my point is, I want to keep creating threads until I hit the max pool size, wait till a thread is done, then keep creating threads until I hit the end of the input.
I also need to keep stuff in order, which is the part that's really annoying.
Your solution is completely fine (the only point is that parallelism is perhaps not necessary if the workload of your WorkerThreads is very small).
With a thread pool, the number of submitted tasks is not relevant. There may be less or more than the number of threads in the pool, the thread pool takes care of that.
However, and this is important: You rely on some kind of order of the results of your WorkerThreads, but when using parallelism, this order is not guaranteed! It doesn't matter whether you use a thread pool, or how much worker threads you have, etc., it will always be possible that your results will be finished in an arbitrary order!
To keep the order right, give each WorkerThread the number of the current item in its constructor, and let them put their results in the right order after they are finished:
int noOfWorkItem = 0;
while((bytesRead = inBytes.read(buff)) != -1) {
System.arraycopy(buff, BLOCK_SIZE-DICT_SIZE, dict, 0, DICT_SIZE);
worker = new WorkerThread(buff, dict, noOfWorkItem++)
tpes.execute(worker);
}
As #ignis points out, parallel execution may not be the best answer for your situation.
However, to answer the more general question, there are several other Executor implementations to consider beyond FixedThreadPool, some of which may have the characteristics that you desire.
As far as keeping things in order, typically you would submit tasks to the executor, and for each submission, you get a Future (which is an object that promises to give you a result later, when the task finishes). So, you can keep track of the Futures in the order that you submitted tasks, and then when all tasks are done, invoke get() on each Future in order, to get the results.
Related
I have part of a system that processes a BlockingQueue of input items within a worker thread, and puts the results on an BlockingQueue of output items, where the relevant code (simplified) looks something like this:
while (running()) {
InputObject a=inputQueue.take(); // Get from input BlockingQueue
OutputObject b=doProcessing(a); // Process the item
outputQueue.put(b); // Place on output BlockingQueue
}
doProcessing is the main performance bottleneck in this code, but the processing of queue items could be parallelised since the processing steps are all independent of each other.
I would therefore like to improve this so that items can be processed concurrently by multiple threads, with the constraint that this must not change the order of outputs (e.g. I can't simply have 10 threads running the loop above, because that might result in outputs being ordered differently depending on processing times).
What is the best way to achieve this in pure, idiomatic Java?
Parallel streams from List preserve ordering:
List<T> input = ...
List<T> output = input.parallelStream()
.filter(this::running)
.map(this::doProcessing)
.collect(Collectors.toList());
PriorityBlockingQueue can be used if your work items can be compared to one another, and you will wait until running() is false before reading from the output queue:
outputQueue = new PriorityBlockingQueue<>();
Or you could order them after they have all been processed (if they can be compared to one another):
outputQueue.drainTo(outputList);
outputList.sort(null);
A simple way to implement comparation would be assigning a progressive ID to each element put into the input queue.
Create X event-loop threads, where X is the amount of steps that can be processed in parallel.
They will be processed in parallel, except one after another, i.e. not on the same item. While one step will be carried on on one item, the previous step will be carried on on the previous item, etc.
To further optimize it, you can use concurrent queues provided by JCTools, which are optimized for Single-Producer Single-Consumer scenarios (JDK's BlockingQueue implementations support Multiple-Producer Multiple-Consumer).
// Thread 1
while (running()) {
InputObject a = inputQueue.take();
OutputObject b = doProcessingStep1(a);
queue1.put(b);
}
// Thread 2
while (running()) {
InputObject a = queue1.take();
OutputObject b = doProcessingStep2(a);
queue2.put(b);
}
// Thread 3
while (running()) {
InputObject a = queue2.take();
OutputObject b = doProcessingStep3(a);
outputQueue.put(b);
}
I'm using Groovy's ASTBuilder (version 2.5.5) in a project. It's being used to parse and analyze groovy expressions received via a REST API. This REST service receives thousands of requests, and the analysis is done on the fly.
I'm noticing some serious performance issues in a multithreaded environment. Below is a simulation, running 100 threads in parallel:
int numthreads = 100;
final Callable<Void> task = () -> {
long initial = System.currentTimeInMillis();
// Simple rule
new AstBuilder().buildFromString("a+b");
System.out.print(String.format("\n\nThread took %s ms.",
System.currentTimeInMillis() - initial));
return null;
};
final ExecutorService executorService = Executors.newFixedThreadPool(numthreads);
final List<Callable<Void>> tasks = new ArrayList<>();
while (numthreads-- > 0) {
tasks.add(task);
}
for (Future<Void> future : executorService.invokeAll(tasks)) {
future.get();
}
Im trying with different thread loads. The greater the number, the slower.
100 threads => ~1800ms
200 threads => ~2500ms
300 threads => ~4000ms
However, if I serialize the threads, (like setting the pool size to 1), I get much better results, around 10ms each thread. Can someone please help me understand why is this happening?
Performing multiple threaded code, computer shares threads between physical CPU cores. That means the more the number of threads exceeds number of cores, the less benefit you get from every thread. In your example the number of threads increases with number of tasks. So with growing up of the task number every CPU core forced to process the more and more threads. At the same time you may notice that difference between numthreads = 1 and numthreads = 4 is very small. Because in this case every core processes only few (or even just one) thread. Don't set number of threads much more than number of physical CPU threads because it doesn't make a lot of sense.
Additionally in your example you're trying to compare how different numbers of threads performs with different numbers of tasks. But in order to see the efficiency of multiple threaded code you have to compare how the different numbers of threads performs with the same number of tasks. I would change the example the next way:
int threadNumber = 16;
int taskNumber = 200;
//...task method
final ExecutorService executorService = Executors.newFixedThreadPool(threadNumber);
final List<Callable<Void>> tasks = new ArrayList<>();
while (taskNumber-- > 0) {
tasks.add(task);
}
long start = System.currentTimeMillis();
for (Future<Void> future : executorService.invokeAll(tasks)) {
future.get();
}
long end = System.currentTimeMillis() - start;
System.out.println(end);
executorService.shutdown();
Try this code for threadNumber=1 and, lets say, threadNumber=16 and you'll see the difference.
Dynamic evaluation of expressions involves a lot of resources including class loading, security manager, compilation and execution. It is not designed for high performance. If you just need to evaluate an expression for its value, you could try groovy.util.Eval. It may not consume as many resources as AstBuilder. However, it is probably not going to be that much different, so don't expect too much.
If you want to get the AST only and not any extra information like types, you could call the parser more directly. This would involve a lot fewer resources. See org.codehaus.groovy.control.ParserPluginFactory for more direct access to the source parser.
I am attempting to parallelise a for-loop using Java streams & ForkJoinPool in order to control the number of threads used. When run with a single thread, the parallelised code returns the same result as the sequential program. The sequential code is a set of standard for-loops:
for(String file : fileList){
for(String item : xList){
for(String x : aList) {
// action code
}
}
}
And the following is my parallel implementation:
ForkJoinPool threadPool = new ForkJoinPool(NUM_THREADS);
int chunkSize = aList.size()/NUM_THREADS;
for(String file : fileList){
for(String item : xList){
IntStream.range(0, NUM_THREADS)
.parallel().forEach(i -> threadPool.submit(() -> {
aList.subList(i*chunkSize, Math.min(i*chunkSize + chunkSize -1, aList.size()-1))
.forEach(x -> {
// action code
});
}));
threadPool.shutdown();
threadPool.awaitTermination(5, TimeUnit.MINUTES);
}
}
When using more than 1 thread, only a limited number of iterations are completed. I have attempted to use .shutdown() and .awaitTermination() to ensure completion of all threads, however this doesn't seem to work. The number of iterations that occur difference dramatically from run to run (between 0-1500).
Note: I'm using a Macbook Pro with 8 available cores (4 dual-cores), and my action code does not contain references that make parallelisation unsafe.
Any advice would be much appreciated, thank you!
I think the actual problem you have is caused by your calling shutdown on the ForkJoinPool. If you look into the javadoc, this results in "an orderly shutdown in which previously submitted tasks are executed, but no new tasks will be accepted" - ie. I'd expect only one task to actually finish.
BTW there's no real point in using a ForkJoinPool the way you use it. A ForkJoinPool is intended to split workload recursively, not unlike you do with your creating sublists in the loop - but a ForkJoinPool is supposed to be fed by RecursiveActions that split their work themselves, rather than splitting it up beforehand like you do in a loop. That's just a side note though; your code should run fine, but it would be clearer if you just submitted your tasks to a normal ExecutorService, eg one you get by Executors.newFixedThreadPool(parallelism) rather than by new ForkJoinPool().
I am having some strange behavior with the use of an ArrayBlockingQueue which I use in order to communicate between certain treads in a java application.
I am using 1 static ArrayBlockingQueue as initialised like this:
protected static BlockingQueue<long[]> commandQueue;
Followed by the constructor which has this as one of its lines:
commandQueue = new ArrayBlockingQueue<long[]>(amountOfThreads*4);
Where amountOfThreads is given as a constructor argument.
I then have a producer that creates an array of long[2] gives it some values and then offers it to the queue, I then change one of the values of the array directly after it and offer it once again to the queue:
long[] temp = new long[2];
temp[0] = currentThread().getId();
temp[1] = gyrAddress;//Address of an i2c sensor
CommunicationThread.commandQueue.offer(temp);//CommunicationThread is where the commandqueue is located
temp[1] = axlAddress;//Change the address to a different sensor
CommunicationThread.commandQueue.offer(temp);
The consumer will then take this data and open up an i2c connection to a specific sensor, get some data from said sensor and communicate the data back using another queue.
For now however I have set the consumer to just consume the head and print the data.
long[] command = commandQueue.take();//This will hold the program until there is at least 1 command in the queue
if (command.length!=2){
throw new ArrayIndexOutOfBoundsException("The command given is of incorrect format");
}else{
System.out.println("The thread with thread id " + command[0] + " has given the command to get data from address " +Long.toHexString(command[1]));
}
Now for testing I have a producer thread with these addresses (byte) 0x34, (byte)0x44
If things are going correctly my output should be:
The thread with thread id 14 has given the command to get data from address 44
The thread with thread id 14 has given the command to get data from address 34
However I get:
The thread with thread id 14 has given the command to get data from address 34
The thread with thread id 14 has given the command to get data from address 34
Which would mean that it is sending the temp array after it has changed it.
Things that I did to try and fix it:
I tried a sleep, if I added a 150 ms sleep then the response is correct.
However this method will quite obviously affect performance...
Since the offer method returns a true I tried the following piece of code
boolean tempBool = false;
while(!tempBool){
tempBool = CommunicationThread.commandQueue.offer(temp);
System.out.println(tempBool);
}
Which prints out a true. This did not have an affect.
I tried printing temp[1] after this while loop and at that moment it is the correct value.(It prints out 44 however the consumer receives 34)
What most likely is the case is a syncronisation issue, however I thought that the point of a BlockingQueue based object would be to solve this.
Any help or suggestion on the workings of this BlockingQueue would be greatly appreciated. Let me end on a note that this is my first time working with queues in between threads in java and that the final program will be running on a raspberry pi using the pi4j library to communicate with the sensors
Since you asked about how BlockingQueue works exactly, let's start with that:
A blocking queue is a queue that blocks when you try to dequeue from it while the queue is empty, or when you try to enqueue items to it while the queue is already full. A thread trying to dequeue from an empty queue is blocked until some other thread inserts an item into the queue.
Soo these blocking queue's prevent different threads from reading/writing to a queue while it is not yet possible because it is either empty or full.
As Andy Turner and JB Nizet already explained, variables are statically shared in memory. This means that when your thread that reads the queue it finds a reference (A.K.A. a pointer) to this variable (in memory) and uses this pointer in it's following code. However before it manages to read this data, you already changed the variable, normally in non-threaded applications this wouldn't be an issue since only one thread will try to read from memory and it will always be executed chronologically. A way to circumvent this is to create a new variable/array (which will assign itself to new memory) with the variable data every time you add an entry to the queue, this way you make sure you do not overwrite a variable in memory before it is processed by the other thread. A simple way to do this is:
long[] tempGyr = new long[2];
tempGyr[0] = currentThread().getId();
tempGyr[1] = gyrAddress;
CommunicationThread.commandQueue.offer(tempGyr);//CommunicationThread is where the commandqueue is located
long[] tempAxl = new long[2];
tempAxl[0] = currentThread().getId();
tempAxl[1] = axlAddress;
CommunicationThread.commandQueue.offer(tempAxl);
Hope this explains the subject, if not: feel free to ask for additional questions :)
I am working on a large scale dataset and after building a model, I use multithreading (whole project in Java) as follows:
OutputStream out = new BufferedOutputStream(new FileOutputStream(outFile));
int i=0;
Collection<Track1Callable> callables = new ArrayList<Track1Callable>();
// For each entry in the test file, do watever needs to be done.
// Track1Callable actually processes that entry and returns a double value.
for (Pair<PreferenceArray, long[]> tests : new DataFileIterable(
KDDCupDataModel.getTestFile(dataFileDirectory))) {
PreferenceArray userTest = tests.getFirst();
callables.add(new Track1Callable(recommender, userTest));
i++;
}
ExecutorService executor = Executors.newFixedThreadPool(cores); //24 cores
List<Future<byte[]>> results = executor.invokeAll(callables);
executor.shutdown();
for (Future<byte[]> result : results) {
for (byte estimate : result.get()) {
out.write(estimate);
}
}
out.flush();
out.close();
When I receive the result from each callable, output it to a file. Does this output in the exact order as the list of initial Callables was made? In spite of some completing before others? Seems it should but not sure.
Also, I expect a total of 6.2 million bytes to be written to the outfile. But I get an additional 2000 bytes (Yeah for free). That messes up my submission and I think it is because of some concurrency issues. I tested this on small dataset and it seems to work fine there (264 bytes expected and received).
Anyhing wrong I am doing with the Executor framework or Futures?
Q: Does the order is the same as the one specified for the tasks? Yes.
From the API:
Returns: A list of Futures
representing the tasks, in the same
sequential order as produced by the
iterator for the given task list. If
the operation did not time out, each
task will have completed. If it did
time out, some of these tasks will not
have completed.
As for the "extra" bytes: have you tried doing all of this in sequential order (i.e., without using an executor) and checking if you obtain different results? It seems that your problem is outside the code provided (and probably is not due to concurrency).
The order in which the callable's are executed doesn't matter from the code you have here. You write the results in the order you store the futures in the list. Even if they were executed in reverse order, the file should appear the same as your file writing is single threaded.
I suspect your callables are interacting with each other and you get different results depending on the number of core you use. e.g. You might be using SimpleDateFormat.
I suggest you run this twice in the same program with a dataset which completes in a short time. Run it first with only one thread in the thread pool and a second time with 24 threads You should be able to compare the results from both runs with Arrays.equals(byte[], byte[]) and see that you get exactly the same results.