How to avoid context switching in Java ExecutorService - java

I use a software (AnyLogic) to export runnable jar files that themselves repeated re-run a set of simulations with different parameters (so-called parameter variation experiments). The simulations I'm running have very RAM intensive, so I have to limit the number of cores available to the jar file. In AnyLogic, the number of available cores is easily set, but from the Linux command line on the servers, the only way I know how to do this is by using the taskset command to just manually specify the available cores to use (using a CPU affinity "mask"). This has worked very well so far, but since you have to specify individual cores to use, I'm learning that there can be pretty substantial differences in performance depending on which cores you select. For example, you would want to maximize the use of CPU cache levels, so if you choose cores that share too much cache, you'll get much slower performance.
Since AnyLogic is written in Java, I can use Java code to specify the running of simulations. I'm looking at using the Java ExecutorService to build a pool of individual runs such that I can just specify the size of the pool to be whatever number of cores would match the RAM of the machine I'm using. I'm thinking that this would offer a number of benefits, most importantly perhaps the computer's scehduler can do a better job of selecting the cores to minimize runtime.
In my tests, I built a small AnyLogic model that take about 10 seconds to run (it just switches between 2 statechart states repeatedly). Then I created a custom experiment with this simple code.
ExecutorService service = Executors.newFixedThreadPool(2);
for (int i=0; i<10; i++)
{
Simulation experiment = new Simulation();
experiment.variable = i;
service.execute( () -> experiment.run() );
}
What I would hope to see is that only 2 Simulation objects start up at a time, since that's the size of the thread pool. But I see all 10 start up and running in parallel over the 2 threads. This makes me think that context switching is happening, which I assume is pretty inefficient.
When, instead of calling the AnyLogic Simulation, I just call a custom Java class (below) in the service.execute function, it seems to work fine, showing only 2 Tasks running at a time.
public class Task implements Runnable, Serializable {
public void run() {
traceln("Starting task on thread " + Thread.currentThread().getName());
try {
TimeUnit.SECONDS.sleep(5);
} catch (InterruptedException e) {
e.printStackTrace();
}
traceln("Ending task on thread " + Thread.currentThread().getName());
}
}
Does anyone know why the AnyLogic function seems to be setting up all the simulations at once?

I'm guessing Simulation extends from ExperimentParamVariation. The key to achieve what you want would be to determine when the experiment has ended.
The documentation shows some interesting methods like getProgress() and getState(), but you would have to poll those methods until the progress is 1 or the state is FINISHED or ERROR. There are also the methods onAfterExperiment() and onError() that should be called by the engine to indicate that the experiment has ended or there was an error. I think you could use these last two methods with a Semaphore to control how many experiments run at once:
import java.util.concurrent.Semaphore;
import com.anylogic.engine.ExperimentParamVariation;
public class Simulation extends ExperimentParamVariation</* Agent */> {
private final Semaphore semaphore;
public Simulation(Semaphore semaphore) {
this.semaphore = semaphore;
}
public void onAfterExperiment() {
this.semaphore.release();
super.onAfterExperiment();
}
public void onError(Throwable error) {
this.semaphore.release();
super.onError(error);
}
// run() cannot be overriden because it is final
// You could create another run method or acquire a permit from the semaphore elsewhere
public void runWithSemaphore() throws InterruptedException {
// This acquire() will block until a permit is available or the thread is interrupted
this.semaphore.acquire();
this.run();
}
}
Then you will have to configure a semaphore with the desired number of permits an pass it to the Simulation instances:
import java.util.concurrent.Semaphore;
// ...
Semaphore semaphore = new Semaphore(2);
for (int i = 0; i < 10; i++)
{
Simulation experiment = new Simulation(semaphore);
// ...
// Handle the InterruptedException thrown here
experiment.runWithSemaphore();
/* Alternative to runWithSemaphore(): acquire the permit and call run().
semaphore.acquire();
experiment.run();
*/
}

Firstly, this whole question has been nullified by what I think is a relatively new addition to AnyLogic's functionality. You can specify an ini file with a specified number of "parallel workers".
https://help.anylogic.com/index.jsp?topic=%2Fcom.anylogic.help%2Fhtml%2Frunning%2Fexport-java-application.html&cp=0_3_9&anchor=customize-settings
But I had managed to find a workable solution just before finding this (better) option. Hernan's answer was almost enough. I think it was hampered by some vagaries of AnyLogic's engine (as I detailed in a comment).
The best version I could muster myself was using ExecuterService. In a Custom Experiment, I put this code:
ExecutorService service = Executors.newFixedThreadPool(2);
List<Callable<Integer>> tasks = new ArrayList<>();
for (int i=0; i<10; i++)
{
int t = i;
tasks.add( () -> simulate(t) );
}
try{
traceln("starting setting up service");
List<Future<Integer>> futureResults = service.invokeAll(tasks);
traceln("finished setting up service");
List<Integer> res = futureResults.stream().parallel().map(
f -> {
try {
return f.get();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
return null;
}).collect(Collectors.toList());
System.out.println("----- Future Results are ready -------");
System.out.println("----- Finished -------");
} catch (InterruptedException e) {
e.printStackTrace();
}
service.shutdown();
The key here was using the Java Future. Also, to use the invokeAll function, I created a function in the Additional class code block:
public int simulate(int variable){
// Create Engine, initialize random number generator:
Engine engine = createEngine();
// Set stop time
engine.setStopTime( 100000 );
// Create new root object:
Main root = new Main( engine, null, null );
root.parameter = variable;
// Prepare Engine for simulation:
engine.start( root );
// Start simulation in fast mode:
//traceln("attempting to acquire 1 permit on run "+variable);
//s.acquireUninterruptibly(1);
traceln("starting run "+variable);
engine.runFast();
traceln("ending run "+variable);
//s.release();
// Destroy the model:
engine.stop();
traceln( "Finished, run "+variable);
return 1;
}
The only limitation I could see to this approach is that I don't have a waiting-while loop to output progress every few minutes. But instead of finding a solution to that, I must abandon this work for the much better settings file solution in the link up top.

Related

How to parallelize loops with Java 8's Fork/Join framework

How to parallelize loops with Java 8's Fork/Join framework. Accually I did not work with multiple threading . I read lots of question in SO .Now i am unable to implement the parallel processing of list in Java 8. Any one can help me ?
I have tried somthing like from this link.
routes.stream().parallel().forEach(this::doSomething);
Scenario like list based on routes list I need to devide the task and execute I need like insted of foreach loop I want parallel execution of based on array size.
My problem is when processing the updateSchedules service it is taking too much time. That is the reason I want to implement the threading concept here. scheduleService.updateSchedules(originId, destinationId,req.getJourneyDate());
for (Availabilities ar : routes) {
try {
log.info("Starting for bus" + ar);
Bus bus = new Bus();
// Get schedule list
BitlaSchedules schedule = scheduleRepo
.findByOriginIdAndDestinationIdAndScheduleIdAndTravelIdAndRouteId(originId,
destinationId, ar.getScheduleId(), ar.getTravelId(), ar.getRouteId());
if (schedule == null) {
scheduleService.updateSchedules(originId, destinationId,req.getJourneyDate());
schedule = scheduleRepo
.findByOriginIdAndDestinationIdAndScheduleIdAndTravelIdAndRouteId(originId,
destinationId, ar.getScheduleId(), ar.getTravelId(), ar.getRouteId());
}
} catch(Exception e) {
log.error(e.getMessage());
}
}
Probably the basic error is that you are trying to do it.
Don't! The fork/join framework is a very specific piece of engineering - which solves a very specific area:
- solving CPU intensive problems;
- that can be split without sharing resources (i.e. no synchronization or locking between ).
Your code seems to use an external service:
- if the service uses database of any kind, then your problem is not CPU intensive;
- even if not, then - since there is an obvious update, then there is a shared, mutable state that requires synchronization (especially since we seem to have multiple writers).
This means that you gain nothing by using the parallel stream.
Just use a standard executor with a thread pool and submit your items as tasks.
As #fdreger already said, it will only help you with CPU intensive tasks.
So before making any assumptions WHY something should be run parallel to gain performance, do yourself a favor and profile. Most of the time the bottleneck is IO related.
I will give you a very simple example how you could use parallel streams in java.
public class Test {
public static void main(String[] args) {
// some dummy data
List<Integer> list = new ArrayList<>();
for (int i = 0; i < 20; ++i) list.add(i);
// to simulate some CPU intensive work
Random random = new SecureRandom();
List<String> result = list.parallelStream().map(i -> {
// simulate work load
int millis = 0;
try {
millis = random.nextInt(1000);
Thread.sleep(millis);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
// return any desired result
return "Done something with " + i + " in thread " + Thread.currentThread().getName() + " took " + millis + "ms";
}).collect(Collectors.toList()); // collect joins - will return once all the workers are done
// print the result
result.forEach(System.out::println);
}
}

How to determine the max possible fixed thread pool size?

Is there a method or utility to determine how many threads can be created in a program, for example with Executors.newFixedThreadPool(numberThreads)? The following custom rollout works but is obviously not an enterprise grade solution:
public class ThreadCounter {
public static void main(String[] args) {
System.out.println("max number threads = " + getMaxNumberThreads());
}
static int getMaxNumberThreads() {
final int[] maxNumberThreads = {0};
try {
while (true) {
new Thread(() -> {
try {
maxNumberThreads[0]++;
Thread.sleep(Integer.MAX_VALUE);
} catch (InterruptedException e) {
}
}).start();
}
} catch (Throwable t) {
}
return maxNumberThreads[0];
}
}
So as a general rule, creating more threads than the number of processors you have isn't really good because you may find bottlenecks between context switching. You can find the number of threads using the available availableProcessors() method like so:
numThreads = Runtime.getRuntime().availableProcessors();
executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(numThreads);
This provides good general scalability as all available processors will be used in your thread pool.
Now sometimes, due to a lot of I/O blocking, or other factors, you may find that it may make sense to increase the number of threads beyond what you have available. In which case you can just multiply the result of numThreads for example to double the thread pool:
executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(numThreads * 2);
I would only recommend this once some benchmarking has been done to see if it's worth it though.
So it's not a max theoretical limit as such (which will be determined by the underlying operating system), but it probably provides you with the realistic limit of being able to take advantage of your computer's hardware.
Hope this helps!

Access Queue from two separate Threads in parallel

So my goal is to measure the performance of a Streaming Engine. It's basically a library to which i can send data-packages. The idea to measure this is to generate data, put it into a Queue and let the Streaming Engine grab the data and process it.
I thought of implementing it like this: The Data Generator runs in a thread and generates data packages in an endless loop with a certain Thread.sleep(X) at the end. When doing the tests the idea is to minimize tis Thread.sleep(X) to see if this has an impact on the Streaming Engine's performance. The Data Generator writes the created packages into a queue, that is, a ConcurrentLinkedQueue, which at the same time is a Singleton.
In another thread I instantiate the Streaming Engine which continuously removes the packages from the queue by doing queue.remove(). This is done in an endlees loop without any sleeping, because it should just be done as fast as possible.
In a first try to implement this I ran into a problem. It seems as if the Data Generator is not able to put the packages into the Queue as it should be. It is doing that too slow. My suspicion is that the endless loop of the Streaming Engine thread is eating up all the resources and therefore slows down everything else.
I would be happy about how to approach this issue or other design patterns, which could solve this issue elegantly.
the requirements are: 2 threads which run in parallel basically. one is putting data into a queue. the other one is reading/removing from the queue. and i want to measure the size of the queue regularly in order to know if the engine which is reading/removing from the queue is fast enough to process the generated packages.
You can use a BlockingQueue, for example ArrayBlockingQueue, you can initialize these to a certain size, so the number of items queued will never exceed a certain number, as per this example:
// create queue, max size 100
final ArrayBlockingQueue<String> strings = new ArrayBlockingQueue<>(100);
final String stop = "STOP";
// start producing
Runnable producer = new Runnable() {
#Override
public void run() {
try {
for(int i = 0; i < 1000; i++) {
strings.put(Integer.toHexString(i));
}
strings.put(stop);
} catch(InterruptedException ignore) {
}
}
};
Thread producerThread = new Thread(producer);
producerThread.start();
// start monitoring
Runnable monitor = new Runnable() {
#Override
public void run() {
try {
while (true){
System.out.println("Queue size: " + strings.size());
Thread.sleep(5);
}
} catch(InterruptedException ignore) {
}
}
};
Thread monitorThread = new Thread(monitor);
monitorThread.start();
// start consuming
Runnable consumer = new Runnable() {
#Override
public void run() {
// infinite look, will interrupt thread when complete
try {
while(true) {
String value = strings.take();
if(value.equals(stop)){
return;
}
System.out.println(value);
}
} catch(InterruptedException ignore) {
}
}
};
Thread consumerThread = new Thread(consumer);
consumerThread.start();
// wait for producer to finish
producerThread.join();
consumerThread.join();
// interrupt consumer and monitor
monitorThread.interrupt();
You could also have third thread monitoring the size of the queue, to give you an idea of which thread is outpacing the other.
Also, you can used the timed put method and the timed or untimed offer methods, which will give you more control of what to do if the queue if full or empty. In the above example execution will stop until there is space for the next element or if there are no further elements in the queue.

Using Multiple Threads in Java To Shorten Program Time

I do not have much experience making multi-threaded applications but I feel like my program is at a point where it may benefit from having multiple threads. I am doing a larger scale project that involves using a classifier (as in machine learning) to classify roughly 32000 customers. I have debugged the program and discovered that it takes about a second to classify each user. So in other words this would take 8.8 hours to complete!
Is there any way that I can run 4 threads handling 8000 users each? The first thread would handle 1-8000, the second 8001-16000, the third 16001-23000, the fourth 23001-32000. Also, as of now each classification is done by calling a static function from another class...
Then when the other threads besides the main one should end. Is something like this feasible? If so, I would greatly appreciate it if someone could provide tips or steps on how to do this. I am familiar with the idea of critical sections (wait/signal) but have little experience with it.
Again, any help would be very much appreciated! Tips and suggestions on how to handle a situation like this are welcomed! Not sure it matters but I have a Core 2 Duo PC with a 2.53 GHZ processor speed.
This is too lightweight for Apache Hadoop, which requires around 64MB chunks of data per server... but.. it's a perfect opportunity for Akka Actors, and, it just happens to support Java!
http://doc.akka.io/docs/akka/2.1.4/java/untyped-actors.html
Basically, you can have 4 actors doing the work, and as they finish classifying a user, or probably better, a number of users, they either pass it to a "receiver" actor, that puts the info into a data structure or a file for output, or, you can do concurrent I/O by having each write to a file on their own.. then the files can be examined/combined when they're all done.
If you want to get even more fancy/powerful, you can put the actors on remote servers. It's still really easy to communicate with them, and you'd be leveraging the CPU/resources of multiple servers.
I wrote an article myself on Akka actors, but it's in Scala, so I'll spare you that. But if you google "akka actors", you'll get lots of hand-holding examples on how to use it. Be brave, dive right in and experiment. The "actor system" is such an easy concept to pick up. I know you can do it!
Split the data up into objects that implement Runnable, then pass them to new threads.
Having more than four threads in this case won't kill you, but you cannot get more parallel work than you have cores (as mentioned in the comments) - if there are more threads than cores the system will have to handle who gets to go when.
If I had a class customer, and I want to issue a thread to prioritize 8000 customers of a greater collection I might do something like this:
public class CustomerClassifier implements Runnable {
private customer[] customers;
public CustomerClassifier(customer[] customers) {
this.customers = customers;
}
#Override
public void run() {
for (int i=0; i< customers.length; i++) {
classify(customer);//critical that this classify function does not
//attempt to modify a resource outside this class
//unless it handles locking, or is talking to a database
//or something that won't throw fits about resource locking
}
}
}
then to issue these threads elsewhere
int jobSize = 8000;
customer[] customers = new customer[jobSize]();
int j = 0;
for (int i =0; i+j< fullCustomerArray.length; i++) {
if (i == jobSize-1) {
new Thread(new CustomerClassifier(customers)).start();//run will be invoked by thread
customers = new Customer[jobSize]();
j += i;
i = 0;
}
customers[i] = fullCustomerArray[i+j];
}
If you have your classify method affect the same resource somewhere you will have to
implement locking and will also kill off your advantage gained to some degree.
Concurrency is extremely complicated and requires a lot of thought, I also recommend looking at the oracle docs http://docs.oracle.com/javase/tutorial/essential/concurrency/index.html
(I know links are bad, but hopefully the oracle docs don't move around too much?)
Disclaimer: I am no expert in concurrent design or in multithreading (different topics).
If you split the input array in 4 equal subarrays for 4 threads, there is no guarantee that all threads finish simultaneously. You better put all data in a single queue and let all working threads feed from that common queue. Use thead-safe BlockingQueue implementations in order to not write low level synchronize/wait/notify code.
From java 6 we have some handy utils for concurrency. You might want to consider using thread pools for cleaner implementation.
package com.threads;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class ParalleliseArrayConsumption {
private int[] itemsToBeProcessed ;
public ParalleliseArrayConsumption(int size){
itemsToBeProcessed = new int[size];
}
/**
* #param args
*/
public static void main(String[] args) {
(new ParalleliseArrayConsumption(32)).processUsers(4);
}
public void processUsers(int numOfWorkerThreads){
ExecutorService threadPool = Executors.newFixedThreadPool(numOfWorkerThreads);
int chunk = itemsToBeProcessed.length/numOfWorkerThreads;
int start = 0;
List<Future> tasks = new ArrayList<Future>();
for(int i=0;i<numOfWorkerThreads;i++){
tasks.add(threadPool.submit(new WorkerThread(start, start+chunk)));
start = start+chunk;
}
// join all worker threads to main thread
for(Future f:tasks){
try {
f.get();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ExecutionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
threadPool.shutdown();
while(!threadPool.isTerminated()){
}
}
private class WorkerThread implements Callable{
private int startIndex;
private int endIndex;
public WorkerThread(int startIndex, int endIndex){
this.startIndex = startIndex;
this.endIndex = endIndex;
}
#Override
public Object call() throws Exception {
for(int currentUserIndex = startIndex;currentUserIndex<endIndex;currentUserIndex++){
// process the user. Add your logic here
System.out.println(currentUserIndex+" is the user being processed in thread " +Thread.currentThread().getName());
}
return null;
}
}
}

reduce in performance when used multithreading in java

I am new to multi-threading and I have to write a program using multiple threads to increase its efficiency. At my first attempt what I wrote produced just opposite results. Here is what I have written:
class ThreadImpl implements Callable<ArrayList<Integer>> {
//Bloom filter instance for one of the table
BloomFilter<Integer> bloomFilterInstance = null;
// Data member for complete data access.
ArrayList< ArrayList<UserBean> > data = null;
// Store the result of the testing
ArrayList<Integer> result = null;
int tableNo;
public ThreadImpl(BloomFilter<Integer> bloomFilterInstance,
ArrayList< ArrayList<UserBean> > data, int tableNo) {
this.bloomFilterInstance = bloomFilterInstance;
this.data = data;
result = new ArrayList<Integer>(this.data.size());
this.tableNo = tableNo;
}
public ArrayList<Integer> call() {
int[] tempResult = new int[this.data.size()];
for(int i=0; i<data.size() ;++i) {
tempResult[i] = 0;
}
ArrayList<UserBean> chkDataSet = null;
for(int i=0; i<this.data.size(); ++i) {
if(i==tableNo) {
//do nothing;
} else {
chkDataSet = new ArrayList<UserBean> (data.get(i));
for(UserBean toChk: chkDataSet) {
if(bloomFilterInstance.contains(toChk.getUserId())) {
++tempResult[i];
}
}
}
this.result.add(new Integer(tempResult[i]));
}
return result;
}
}
In the above class there are two data members data and bloomFilterInstance and they(the references) are passed from the main program. So actually there is only one instance of data and bloomFilterInstance and all the threads are accessing it simultaneously.
The class that launches the thread is(few irrelevant details have been left out, so all variables etc. you can assume them to be declared):
class MultithreadedVrsion {
public static void main(String[] args) {
if(args.length > 1) {
ExecutorService es = Executors.newFixedThreadPool(noOfTables);
List<Callable<ArrayList<Integer>>> threadedBloom = new ArrayList<Callable<ArrayList<Integer>>>(noOfTables);
for (int i=0; i<noOfTables; ++i) {
threadedBloom.add(new ThreadImpl(eval.bloomFilter.get(i),
eval.data, i));
}
try {
List<Future<ArrayList<Integer>>> answers = es.invokeAll(threadedBloom);
long endTime = System.currentTimeMillis();
System.out.println("using more than one thread for bloom filters: " + (endTime - startTime) + " milliseconds");
System.out.println("**Printing the results**");
for(Future<ArrayList<Integer>> element: answers) {
ArrayList<Integer> arrInt = element.get();
for(Integer i: arrInt) {
System.out.print(i.intValue());
System.out.print("\t");
}
System.out.println("");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
I did the profiling with jprofiler and
![here]:(http://tinypic.com/r/wh1v8p/6)
is a snapshot of cpu threads where red color shows blocked, green runnable and yellow is waiting. I problem is that threads are running one at a time I do not know why?
Note:I know that this is not thread safe but I know that I will only be doing read operations throughout now and just want to analyse raw performance gain that can be achieved, later I will implement a better version.
Can anyone please tell where I have missed
One possibility is that the cost of creating threads is swamping any possible performance gains from doing the computations in parallel. We can't really tell if this is a real possibility because you haven't included the relevant code in the question.
Another possibility is that you only have one processor / core available. Threads only run when there is a processor to run them. So your expectation of a linear speed with the number of threads and only possibly achieved (in theory) if is a free processor for each thread.
Finally, there could be memory contention due to the threads all attempting to access a shared array. If you had proper synchronization, that would potentially add further contention. (Note: I haven't tried to understand the algorithm to figure out if contention is likely in your example.)
My initial advice would be to profile your code, and see if that offers any insights.
And take a look at the way you are measuring performance to make sure that you aren't just seeing some benchmarking artefact; e.g. JVM warmup effects.
That process looks CPU bound. (no I/O, database calls, network calls, etc.) I can think of two explanations:
How many CPUs does your machine have? How many is Java allowed to use? - if the threads are competing for the same CPU, you've added coordination work and placed more demand on the same resource.
How long does the whole method take to run? For very short times, the additional work in context switching threads could overpower the actual work. The way to deal with this is to make a longer job. Also, run it a lot of times in a loop not counting the first few iterations (like a warm up, they aren't representative.)
Several possibilities come to mind:
There is some synchronization going on inside bloomFilterInstance's implementation (which is not given).
There is a lot of memory allocation going on, e.g., what appears to be an unnecessary copy of an ArrayList when chkDataSet is created, use of new Integer instead of Integer.valueOf. You may be running into overhead costs for memory allocation.
You may be CPU-bound (if bloomFilterInstance#contains is expensive) and threads are simply blocking for CPU instead of executing.
A profiler may help reveal the actual problem.

Categories