Problems Synchronizing threads in Java - java

ok so I did my research there is plenty of questions here on thread synchronization but non of them really hit the point. I am currently working in Opencv, I get a frame from the camera containing vehicles, remove the background and track the vehicles, but before I do this I do some pre-processing and post-processing like removing noise with blur, all this runs in a single thread and it works great but here comes an issue, I now want to read number plates, for this i need a higher resolution frame otherwise for every frame I will not detect a single plate, but as soon as i increase my frame size I get a performance hit,my threads slows down to the point that my program no longer qualifies to be a real time system.
So I thought of adding more threads to my scene each to specialize on one task
here is a list of my tasks
//recieves fame from Camera
1. preprocess
//recieves a Frame from preprocess and removes the background
2. remove background
//recieves a Frame from backgroundremover and tracks the vehicles
3. postprocess
If I run the threads one by one am thinking it will still be slow instead I thought or running the threads simultenously but the issues it they use the same objects, declaring them volatile will mean threads waiting for the thread with lock to complete for it to use the object which will mean a slow system again so my question is how can I run these threads simultaneously without having to wait for others?
I have looked at a dozen of multithreading techniques in Java but finding it really hard to come up with a way of making this work.
So far I have looked at
1. Thread synchronization using the keyword volatile
2. Thread synchronization using the keyword synchronized
3. Multiple thread locks using a lock object
4. Using threadpools
5. Using the Countdown Latch
6. Wait and motify
7. Using Semaphores(which seemed like a good idea).
Here is the code I want to break down into those threads
public void VideoProcessor()
{
videProcessorThread = new Thread(new Runnable()
{
#Override
public void run()
{
try{
int i = 0;
while (isPlaying() && isMainScreenONOFF()) {
camera.read(frame);
//set default and max frame speed
camera.set(Videoio.CAP_PROP_FPS, 25);
//get frame speed just incase it did not set
fps = camera.get(Videoio.CAP_PROP_FPS);
//if(frame.height() > imgHeight || frame.width() > imgWidth)
Imgproc.resize(frame, frame, frameSize);
//check if to convert or not
if(getblackAndWhite())
Imgproc.cvtColor(frame, frame, Imgproc.COLOR_RGB2GRAY);
imag = frame.clone();
if(rOI){
//incase user adjusted the lines we try calculate there new sizes
adjustLinesPositionAndSize(xAxisSlider.getValue(), yAxisSlider.getValue());
//then we continue and draw the lines
if(!roadIdentified)
roadTypeIdentifier(getPointA1(), getPointA2());
}
viewClass.updateCarCounter(tracker.getCountAB(), tracker.getCountBA());
if (i == 0) {
// jFrame.setSize(FRAME_WIDTH, FRAME_HEIGHT);
diffFrame = new Mat(outbox.size(), CvType.CV_8UC1);
diffFrame = outbox.clone();
}
if (i == 1) {
diffFrame = new Mat(frame.size(), CvType.CV_8UC1);
removeBackground(frame, diffFrame, mBGSub, thresHold.getValue(), learningRate.getValue());
frame = diffFrame.clone();
array = detectionContours(diffFrame, maximumBlob.getValue(), minimumBlob.getValue());
Vector<VehicleTrack> detections = new Vector<>();
Iterator<Rect> it = array.iterator();
while (it.hasNext()) {
Rect obj = it.next();
int ObjectCenterX = (int) ((obj.tl().x + obj.br().x) / 2);
int ObjectCenterY = (int) ((obj.tl().y + obj.br().y) / 2);
//try counter
//add centroid and bounding rectangle
Point pt = new Point(ObjectCenterX, ObjectCenterY);
VehicleTrack track = new VehicleTrack(frame, pt, obj);
detections.add(track);
}
if (array.size() > 0) {
tracker.update(array, detections, imag);
Iterator<Rect> it3 = array.iterator();
while (it3.hasNext()) {
Rect obj = it3.next();
int ObjectCenterX = (int) ((obj.tl().x + obj.br().x) / 2);
int ObjectCenterY = (int) ((obj.tl().y + obj.br().y) / 2);
Point pt = null;
pt = new Point(ObjectCenterX, ObjectCenterY);
Imgproc.rectangle(imag, obj.br(), obj.tl(), new Scalar(0, 255, 0), 2);
Imgproc.circle(imag, pt, 1, new Scalar(0, 0, 255), 2);
//count and eleminate counted
tracker.removeCounted(tracker.tracks);
}
} else if (array.size() == 0) {
tracker.updateKalman(imag, detections);
}
}
i = 1;
//Convert Image and display to View
displayVideo();
}
//if error occur or video finishes
Image image = new Image("/assets/eyeMain.png");
viewClass.updateMainImageView(image);
}catch(Exception e)
{
e.printStackTrace();
System.out.println("Video Stopped Unexpectedly");
}
//thread is done
}
});videProcessorThread.start();
}

As no-one else has replied, I'll have a go.
You've already covered the main technical aspects in your questions (locking, synchronisation etc). Whichever way you look at it, there is no general solution to designing a multi-threaded system. If you have threads accessing the same objects you need to design your synchronisation and you can get threads blocking each other, slowing everything down.
The first thing to do is to do some performance profiling as there is no point making things run in parallel if they are not slowing things down.
That said, I think there are three approaches you could take in your case.
Have a single thread process each frame but have a pool of threads processing frames in parallel. If it takes a second to process a frame and you have 25fps you'd need at least 25 threads to keep up with the frame rate. You'd always be about a second behind real time but you should be able to keep up with the frame rate.
A typical way to implement this would be to put the incoming frames in a queue. You then have a pool of threads reading the latest frame from the queue and processing it. The downside of this design is that you can't guarantee in which order you would get the results of the processing so you might need to add some more logic around sorting the results.
The advantages are that:
There is very little contention, just around getting the frames off the queue, and that should be minimal
It is easy to tune and scale by adjusting the number of threads. It could even run on multiple machines, depending on how easy it is to move frames between machines
You avoid the overhead of creating a new thread each time as each thread processing one frame after another
It is easy to monitor as you can just look at the size of the queue
Error handling can be implemented easily, eg use ActiveMQ to re-queue a frame if a thread crashes.
Run parts of your algorithm in parallel. The way you've written it (pre-process, process, post-process), I don't see this is suitable as you can't do the post processing at the same time as the pre-processing. However, if you can express your algorithm in steps that can be run in parallel then it might work.
Try and run specific parts of your code in parallel. Looking at the code you posted the iterators are the obvious choice. Is there any reason not to run the iterator loops in parallel? If you can, experiment with the Java parallel streams to see if that bring any performance gains.
Personally I'd try option 1 first as its quick and simple.

Related

Physics update lag

I ran into a problem, where doing everything on a single thread may lead to some lags. The problem is, when I start to create loads of new objects (like 300 per second), my physics rate drops.
I sort all rendering objects each frame, so I would know which one to draw in which order, this might be the reason why it can handle only so little, but even if that was removed, there still would be like max operations per update, otherwise physics will lag.
Any ideas on how to achieve the correct zOrder, or remove possible physics lags?
Or detach physics from rendering ... ?
My game loop:
while (isRunning) {
currentFrameTime = System.nanoTime();
passedTime = currentFrameTime - lastFrameTime;
lastFrameTime = currentFrameTime;
physicsPassedTime += passedTime;
updatePassedTime += passedTime;
if (physicsPassedTime >= (double) 1_000_000_000 / physicsRate) {
physicsPassedTime = 0;
PhysicsUpdate();
}
if (updatePassedTime >= (double) 1_000_000_000 / refreshRate) {
updatePassedTime = 0;
Update();
Render();
LateUpdate();
}
}
Looks like the best solution (as suggested in comments) will be to run a second loop on a second thread with just physics update, and everything else on the other thread.
That way frame drops should not intervene with the physic updates.
Edit: Implemented this, and works like charm. I'll mark the answer when I'll be able to.

Use of parallels in java shows no increase in performance

I recently embarked on a project to simulate a collection of stellar bodies with the use of LWJGL. The solution required many loop iterations per frame to accomplish. The program calculates the forces exerted on each body by every other body. I did not wish to implement any form of limitations, such as tree algorithms. The program itself is able to simulate 800 bodies of random mass (between 1 and 50) at around 15 fps. Here is the original code for calculating, then updating the position of each body.
public void updateAllBodies() {
for (Body b : bodies) {
for (Body c : bodies) {
if (b != c) {
double[] force = b.getForceFromBody(c, G);
b.velocity[0] += force[0];
b.velocity[1] += force[1];
b.velocity[2] += force[2];
b.updatePosition();
}
}
}
}
Recently I came across the subject of parallels and streams. Seeing that my original code used only one thread, I thought I might be able to improve the performance by converting the array to a stream, and executing it with the use of
.parallelStream()
I don't know much about multi-threading and parallelism, but here is the resulting code that I came up with.
public void updateAllBodies() {
Arrays.asList(bodies).parallelStream().forEach(i -> {
for(Body b: bodies){
if (i != b){
double[] force = i.getForceFromBody(b, G);
i.velocity[0] += force[0];
i.velocity[1] += force[1];
i.velocity[2] += force[2];
i.updatePosition();
}
}
});
}
Unfortunately, when executed, this new code resulted in the same 15 fps as the old one. I was able to confirm that there were 3 concurrent threads running with
Thread.currentThread().getName();
At this point, I have no idea as to what the cause could be. lowering the number of bodies does show a drastic increase in frame rate. Any help will be greatly appreciated.
I cant seem to find a way to mark a comment as the answer to a post, so I will state that the best answer was given by softwarenwebie7331.

Threads stopping prematurely for certain values

Background
So I'm writing an application that aims to perform Monte Carlo simulations to investigate graphs that can evolve via the Moran process (evolutionary graph theory). For un-directed graphs this works perfectly but for directed graphs the application has been exhibiting strange behaviour and I can't for the life of me figure out why. What seems to happen is that when this Boolean variable isDirected is set to true, the threads exit the for loop they run in before the loop condition is met, despite working properly when isDirected is false.
The graphs are represented by an adjacency matrix so the only difference in the code when the graph is directed is that the adjacency matrix is non-symmetric, but I can't see any reason that would have an impact.
Code
The main relevant code is this section from the controller:
//Initialise a threadPool and an array of investigators to provide each thread with an Investigator runnable
long startTime = System.nanoTime();
int numThreads = 4;
Investigator[] invArray = new Investigator[numThreads];
ExecutorService threadPool = Executors.newFixedThreadPool(numThreads);
//Assign the tasks to the threads
for(int i=0;i<numThreads;i++){
invArray[i] = new Investigator(vertLimit,iterations,graphNumber/numThreads,isDirected,mutantFitness,vertFloor);
threadPool.submit(invArray[i]);
}
threadPool.shutdown();
//Wait till all the threads are finished, note this could cause the application to hang for the user if the threads deadlock
try{
threadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
}catch(InterruptedException except){
System.out.println("Thread interrupted");
}
//The next two blocks average the results of the different threads into 1 array
double[] meanArray = new double[vertLimit];
double[] meanError = new double[vertLimit];
double[] fixProbArray = new double[vertLimit];
double[] fixProbError = new double[vertLimit];
for(int x=0;x<vertLimit;x++){
for(Investigator i:invArray){
meanArray[x] += i.getMeanArray()[x];
meanError[x] += Math.pow(i.getMeanError()[x], 2);
fixProbArray[x] += i.getFixProbArray()[x];
fixProbError[x] += Math.pow(i.getFixProbError()[x], 2);
}
meanArray[x] = meanArray[x]/numThreads;
fixProbArray[x] = fixProbArray[x]/numThreads;
meanError[x] = Math.sqrt(meanError[x]);
fixProbError[x] = Math.sqrt(fixProbError[x]);
}
long endTime = System.nanoTime();
//The remaining code is for printing and producing graphs of the results
As well as the Investigator class, the important parts of which are shown below:
public class Investigator implements Runnable{
public Investigator(int vertLimit,int iterations,int graphNumber,Boolean isDirected,int mutantFitness,int... vertFloor){
//Constructor just initialises all the class variables passed in
}
public void run(){
GraphGenerator g = new GraphGenerator();
Statistics stats = new Statistics();
//The outer loop iterates through graphs with increasing number of vertices, this is the problematic loop that exits too early
for(int x = vertFloor>2?vertFloor:2; x < vertLimit; x++){
System.out.println("Current vertex amount: " + x);
double[] currentMean = new double[graphNumber];
double[] currentMeanErr = new double[graphNumber];
double[] currentFixProb = new double[graphNumber];
double[] currentFixProbErr = new double[graphNumber];
//This loop generates the required number of graphs of the given vertex number and performs a simulation on each one
for(int y=0;y<graphNumber;y++){
Simulator s = new Simulator();
matrix = g.randomGraph(x, isDirected, mutantFitness);
s.moranSimulation(iterations, matrix);
currentMean[y] = stats.freqMean(s.getFixationTimes());
currentMeanErr[y] = stats.freqStandError(s.getFixationTimes());
currentFixProb[y] = s.getFixationProb();
currentFixProbErr[y] = stats.binomialStandardError(s.getFixationProb(), iterations);
}
meanArray[x] = Arrays.stream(currentMean).sum()/currentMean.length;
meanError[x] = Math.sqrt(Arrays.stream(currentMeanErr).map(i -> i*i).sum());
fixProbArray[x] = Arrays.stream(currentFixProb).sum()/currentFixProb.length;
fixProbError[x] = Math.sqrt(Arrays.stream(currentFixProbErr).map(i -> i*i).sum());;
}
}
//A number of getter methods also provided here
}
Problem
I've put in some print statements to work out what's going on and for some reason when I set isDirected to true the threads are finishing before x reaches the vertLimit (which I've checked is indeed the value I specified). I've tried manually using my GraphGenerator.randomGraph() method for a directed graph and it is giving the correct output as well as testing Simulator.moranSimulation() which also works fine for directed graphs when called manually and I'm not getting a thread interruption caught by my catch block so that's not the issue either.
For the same set of parameters the threads are finishing at different stages seemingly randomly, sometimes they are all on the same value of x when they stop, sometimes some of the threads will have gotten further than the others but that changes from run to run.
I'm completely stumped here and would really appreciate some help, thanks.
When tasks are being run by an ExecutorService, they can sometimes appear to end prematurely if an unhandled exception is thrown.
Each time you call .submit(Runnable) or .submit(Callable) you get a Future object back that represents the eventual completion of the task. The Future object has a .get() method that will return the result of the task when it is complete. Calling this method will block until that result is available. Also, if the task throws an exception that is not otherwise handled by your task code, the call to .get() will throw an ExecutionException which will wrap the actual thrown exception.
If your code is exiting prematurely due to an unhandled exception, call .get() on each Future object you get when you submit the task for execution (after you have submitted all the tasks you wish to) and catch any ExecutionExceptions that happen to be thrown to figure out what the actual underlying problem is.
It looks like you might be terminating the threads prematurely with threadPool.shutdown();
From the Docs:
This method does not wait for previously submitted tasks to complete execution. Use awaitTermination to do that.
The code invokes .shutdown before awaitTermination...

Java code seems to only use two concurrent threads

I have approximately 40000 objects which might need to be repainted.
Most of them are not on the screen, so it seems that I could save a lot of work by doing the checks concurrently. But, my CPU never goes above 15% usage, so it seems that it is still only using one core. Have I implemented the threads correctly? If so, why aren't all my cores being used? And is there a better way which does utilize all my cores?
public void paintComponent(Graphics g)
{
super.paintComponent(g);
if (game.movables.size() > 10000)
{
final int size = game.drawables.size();
final Graphics gg = g;
Thread[] threads = new Thread[8];
for (int j = 0; j < 8; ++j)
{
final int n = j;
threads[j] = new Thread(new Runnable()
{
public void run()
{
Drawable drawMe;
int start = (size / 8) * n;
int end = (size / 8) * (n + 1);
if (n == 8) end = game.drawables.size(); // incase size
// % 8 != 0
for (int i = start; i < end; ++i)
{
drawMe = game.drawables.get(i);
if (drawMe.isOnScreen())
{
synchronized (gg)
{
drawMe.draw(gg);
}
}
}
}
});
threads[j].start();
}
try
{
for (int j = 0; j < 8; ++j)
threads[j].join();
}
catch (InterruptedException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
else
{
for (Drawable drawMe : game.drawables)
{
if (drawMe.isOnScreen())
{
drawMe.draw(g);
}
}
}
}
As has been pointed out, the synchronized (gg) is effectively serializing all the drawing, so you're probably going slower than single-threaded code due to thread creation and other overhead.
The main reason I'm writing however is that Swing, which this presumably is, is not thread safe. So the behavior of this program is not only likely to be bad, it's undefined.
Threading errors like this turn up as screwy behavior on some machines with some java runtime parameters and some graphics drivers. Been there. Done that. Not good.
JOGL will give you direct access to the GPU, the surest way to speed rendering.
To do this right, you might start by putting each drawMe in a (properly synchronized) list, then actually draw them in a loop after the joins are done. You can't speed the drawing (though if you've knocked out 99% of the drawMe's you've cut down the time needed dramatically), but if isOnScreen() is somewhat complicated, you'll get some real work out of your cores.
A ConcurrentLinkedQueue would save you the need to synchronize adds to the list.
The next step might be to use a blocking queue instead of a list, so the paint code could run in parallel with the visibility checks. With eight checks running, they should keep well ahead of the drawing. (But I think all the blocking queues either need synchronizing or do synching themselves. I'd skip this and stick with the CLQ and the first solution. Simpler and possibly faster.)
And (as Gene pointed out), everything Swing related starts on the EventQueue. Keep it there or life will get strange. Only your own code, not referencing the UI, should run in your threads.
Since you're already not drawing any objects that are off-screen, you're probably gaining very very little by doing what you're doing above.
I would also go as far as to say you're making it worse, but introducing synchronize which is slow and also introducing threads that cause context switches, which are expensive.
To improve performace you should perhaps look into using different drawing libraries, such as the Java2D drawing library, which is part of the JDK: http://java.sun.com/products/java-media/2D/index.jsp
I'm not sure how java will handle this, but other languages will blow up horribly and die if you reference something across scopes like you're doing with final int n (since it goes out of scope when the loop stops). Consider making it a field of the runnable object. Also, you're synchronizing on the graphics object while you're doing all of the real work. It's likely that you aren't getting any real performance increase from this. You might benefit from explicitly checking if the object is on the screen in parallel which is a read only operation, adding on-screen objects to a set or collection of some other sort, and then rendering sequentially.

Code inside thread slower than outside thread..?

I'm trying to alter some code so it can work with multithreading. I stumbled upon a performance loss when putting a Runnable around some code.
For clarification: The original code, let's call it
//doSomething
got a Runnable around it like this:
Runnable r = new Runnable()
{
public void run()
{
//doSomething
}
}
Then I submit the runnable to a ChachedThreadPool ExecutorService. This is my first step towards multithreading this code, to see if the code runs as fast with one thread as the original code.
However, this is not the case. Where //doSomething executes in about 2 seconds, the Runnable executes in about 2.5 seconds. I need to mention that some other code, say, //doSomethingElse, inside a Runnable had no performance loss compared to the original //doSomethingElse.
My guess is that //doSomething has some operations that are not as fast when working in a Thread, but I don't know what it could be or what, in that aspect is the difference with //doSomethingElse.
Could it be the use of final int[]/float[] arrays that makes a Runnable so much slower? The //doSomethingElse code also used some finals, but //doSomething uses more. This is the only thing I could think of.
Unfortunately, the //doSomething code is quite long and out-of-context, but I will post it here anyway. For those who know the Mean Shift segmentation algorithm, this a part of the code where the mean shift vector is being calculated for each pixel. The for-loop
for(int i=0; i<L; i++)
runs through each pixel.
timer.start(); // this is where I start the timer
// Initialize mode table used for basin of attraction
char[] modeTable = new char [L]; // (L is a class property and is about 100,000)
Arrays.fill(modeTable, (char)0);
int[] pointList = new int [L];
// Allcocate memory for yk (current vector)
double[] yk = new double [lN]; // (lN is a final int, defined earlier)
// Allocate memory for Mh (mean shift vector)
double[] Mh = new double [lN];
int idxs2 = 0; int idxd2 = 0;
for (int i = 0; i < L; i++) {
// if a mode was already assigned to this data point
// then skip this point, otherwise proceed to
// find its mode by applying mean shift...
if (modeTable[i] == 1) {
continue;
}
// initialize point list...
int pointCount = 0;
// Assign window center (window centers are
// initialized by createLattice to be the point
// data[i])
idxs2 = i*lN;
for (int j=0; j<lN; j++)
yk[j] = sdata[idxs2+j]; // (sdata is an earlier defined final float[] of about 100,000 items)
// Calculate the mean shift vector using the lattice
/*****************************************************/
// Initialize mean shift vector
for (int j = 0; j < lN; j++) {
Mh[j] = 0;
}
double wsuml = 0;
double weight;
// find bucket of yk
int cBucket1 = (int) yk[0] + 1;
int cBucket2 = (int) yk[1] + 1;
int cBucket3 = (int) (yk[2] - sMinsFinal) + 1;
int cBucket = cBucket1 + nBuck1*(cBucket2 + nBuck2*cBucket3);
for (int j=0; j<27; j++) {
idxd2 = buckets[cBucket+bucNeigh[j]]; // (buckets is a final int[] of about 75,000 items)
// list parse, crt point is cHeadList
while (idxd2>=0) {
idxs2 = lN*idxd2;
// determine if inside search window
double el = sdata[idxs2+0]-yk[0];
double diff = el*el;
el = sdata[idxs2+1]-yk[1];
diff += el*el;
//...
idxd2 = slist[idxd2]; // (slist is a final int[] of about 100,000 items)
}
}
//...
}
timer.end(); // this is where I stop the timer.
There is more code, but the last while loop was where I first noticed the difference in performance.
Could anyone think of a reason why this code runs slower inside a Runnable than original?
Thanks.
Edit: The measured time is inside the code, so excluding startup of the thread.
All code always runs "inside a thread".
The slowdown you see is most likely caused by the overhead that multithreading adds. Try parallelizing different parts of your code - the tasks should neither be too large, nor too small. For example, you'd probably be better off running each of the outer loops as a separate task, rather than the innermost loops.
There is no single correct way to split up tasks, though, it all depends on how the data looks and what the target machine looks like (2 cores, 8 cores, 512 cores?).
Edit: What happens if you run the test repeatedly? E.g., if you do it like this:
Executor executor = ...;
for (int i = 0; i < 10; i++) {
final int lap = i;
Runnable r = new Runnable() {
public void run() {
long start = System.currentTimeMillis();
//doSomething
long duration = System.currentTimeMillis() - start;
System.out.printf("Lap %d: %d ms%n", lap, duration);
}
};
executor.execute(r);
}
Do you notice any difference in the results?
I personally do not see any reason for this. Any program has at least one thread. All threads are equal. All threads are created by default with medium priority (5). So, the code should show the same performance in both the main application thread and other thread that you open.
Are you sure you are measuring the time of "do something" and not the overall time that your program runs? I believe that you are measuring the time of operation together with the time that is required to create and start the thread.
When you create a new thread you always have an overhead. If you have a small piece of code, you may experience performance loss.
Once you have more code (bigger tasks) you make get a performance improvement by your parallelization (the code on the thread will not necessarily run faster, but you are doing two thing at once).
Just a detail: this decision of how big small can a task be so parallelizing it is still worth is a known topic in parallel computation :)
You haven't explained exactly how you are measuring the time taken. Clearly there are thread start-up costs but I infer that you are using some mechanism that ensures that these costs don't distort your picture.
Generally speaking when measuring performance it's easy to get mislead when measuring small pieces of work. I would be looking to get a run of at least 1,000 times longer, putting the whole thing in a loop or whatever.
Here the one different between the "No Thread" and "Threaded" cases is actually that you have gone from having one Thread (as has been pointed out you always have a thread) and two threads so now the JVM has to mediate between two threads. For this kind of work I can't see why that should make a difference, but it is a difference.
I would want to be using a good profiling tool to really dig into this.

Categories