As a homework assignment in my current CS course, we've been instructed to write a program that implements an A* algorithm for the n-puzzle problem. In order to solve, you must take in an initial nxn board configuration from StdIn. The catch is some of the boards may not be solvable. Thankfully for us, if you create a "twin" board by flipping any two non-zero squares and attempt to solve that, either the original or the twin must be solvable. Therefore, in order to implement the algorithm we are effectively trying to solve two boards at the same time, the original and the twin.
Doing this in a single-thread was quite easy and that's what the actual assignment is. Looking at this problem, it seems like this is a perfect place to utilize parallelism. I was thinking that from the main thread I would try to spawn two concurrent threads each trying to solve their own board. Is this possible to do without too much crazy code in java? For that matter, on a multicore chip would this run significantly faster than the non-multithread version? I am trying to read through the java documentation for threads but it's a little thick for somebody who has never tried this before and for that matter, I find I learn much more quickly by writing/looking at examples than reading more documentation.
Could somebody please give me some sample code that shows the type structures, class, important statements, ect. that would be necessary to do this? So far I'm thinking that I want to implement a private class the implements runnable and having the main thread throw an interrupt to which ever thread does not finish first to figure out which one is solvable plus the number of moves and sequence of boards to get there.
EDIT:
TL;DR THIS IS NOT PART OF THE GRADED ASSIGNMENT. The assignment was to do a single threaded implementation. For my own enrichment and SOLELY my own enrichment I want to try and make my implementation multithreaded.
Since you don't want the implementation yourself threaded (which is arguably a lot more complex; the transposition table is the bottleneck for parallel A* implementations - but in praxis parallel IDA* algorithms are easier to implement anyhow and have the usual advantages) the problem is actually really quite simple.
Just pack your implementation in a runnable Class and use a thread. For simplicity you can just use a global volatile boolean variable that is initialized to false and set to true as soon as one thread has found the solution.
You now just check the flag at apropriate situations in your code and return if the other thread has already found a solution. You could also use interrupts but keeping it simple can't harm (and in the end it's actually quite similar anyhow, you'd just check the variable a bit fancier).
Trivial example:
public class Main implements Runnable {
private static volatile boolean finished = false;
public static void main(String[] args) {
new Thread(new Main()).start();
new Main().run();
}
#Override
public void run() {
while (!finished) {
// do stuff
if (solutionFound) {
finished = true;
// save result
}
}
return;
}
}
Forget about solving two boards, with one of them being unsolvable. I don't see how is that even useful, but ignoring that, parallelization should not stop at two processors. If system has more of those then algorithm should use them all. BTW, checking if the board is solvable is rather easy. Check out the section Solvability in Wikipedia article.
To parallelize things your implementation of A* should have some kind of priority queue that sorts items by heuristic value. Expansion of a node in search tree involves removing node from the top of the queue, and inserting several nodes back in the queue, keeping the queue sorted. When things are organized like this then adding more threads to insert and remove stuff is rather simple. Just make the access to the queue synchronized.
Related
First I'll explain what I want to do and afterwords I'll provide a proposed solution.
Problem
I'm running a game where I want to do a certain amount of work every frame. For example, I have N objects that are in a queue waiting to be initialized (imagine initialization is a fairly expensive operation and N is large) and after adding them all, I want to create their collision boxes, and after that, I want to merge them together to limit render calls. I can't do these operations on a different thread because all this stuff is heavily coupled with the game world. But I want to split up all these operations into bite-size chunks to run each frame so that there is minimal lag (framerate dips). How would I go about doing this?
Proposed Solution
It would be nice to have a function that can stop after one call and continue where it left off after calling it again:
For example,
boolean loadEverything() {
for (int i = 0; i < objectsToAdd.length; i++) {
world.add(objectsToAdd[i]);
if (i % 10 == 0) {
return stop();
}
}
makeCollision();
return stop();
mergeObjects();
return true;
}
Calling loadEverything() the first objectsToAdd/10 times adds 10 objects to the game world at a time. Then calling it after should run makeCollision() and then stop. Calling it again runs mergeObjects() and then the function returns true. In the caller function I would run loadEverything() until it returns true.
I'm aware of yeild-return, yield-break implementations like those described here exist, but I'm wondering if there's a more general implementation of them, or that maybe a better solution exists that doesn't require any extra dependencies.
Do you look at Coroutine yet? There's native implementation in Kotlin but in Java there're options here and here.
By all means we need to make sure those OpenGL or Box2D operations that required to be in main thread should be in main thread, as I believe coroutine will be created under a new thread. So there might not be gain to split works for those kind of operations.
Another option
You say you need to split works in creating objects in run time. Can you predict or estimate the number of objects you would want before hand? and so if you don't really need to dynamically create object like that, I suggest to look at Object Pool in libgdx (see more here). That link has working example to use Pool in your game.
Such Pool already have initialized objects ready to be grabbed and used on-demand, also can grow if need in run time, so initially if you can provide a good estimation of number of objects you intend to use, it's all good.
Why don't you add one static variable which would keep it's value between function calls? Then you can loop from current value to current value + 10, increase current value (that static variable) by 10 and exit.
My story
I am quite a beginner in parallel programming (I didn't ever do anything more than writing some basic multithreaded things) and I need to parallelize some multithreaded java-code in order to make it run faster. The multithreaded algorithm simply generates threads and passes them to the operating system which does the distribution of threads for me. The results of every thread can be gathered by some collector that also handles synchronisation issues with semaphores etc and calculates the sum of the results of all different threads. The multithreaded code kinda looks like this:
public static void main(String[] args) {
int numberOfProcesses = Integer.parseInt(args[0]);
...
Collector collector = new Collector(numberOfProcesses);
while(iterator.hasNext()) {
Object x = iterator.next();
new OverwrittenThread(x, collector, otherParameters).start();
}
if(collector.isReady())
System.out.prinltn(collector.getResult());
}
My first idea to convert this to MPI, was the basic way (I guess) to just split up the loop and give every iteration of this loop to another processor like this (with mpiJava):
public static void main(String[args]) {
...
Object[] foo = new Object[number];
int i = 0;
while(iterator.hasNext())
foo[i++] = iterator.next();
...
int myRank = MPI.COMM_WORLD.Rank();
for(int i = myRank; i < numberOfElementsFromIterator; i += myRank) {
//Perform code from OverwrittenThread on foo[i]
}
MPI.COMM_WORLD.Reduce(..., MPI.SUM, ...);
}
The problems
This is, till now, the only way that I, as newbie in mpi, could make things work. This is only an idea, because I have no idea how to tackle implementation-problems like conversion of BigIntegers to MPI datatypes, etc. (But I would get this far, I guess)
The real problem though, is the fact that this approach of solving the problem, leaves the distribution of work very unbalanced because it doesn't take into account how much work a certain iteration takes. This might really cause some troubles as some iterations can be finished in less than a second and others might need several minutes.
My question
Is there a way to get a similar approach like the multithreaded version in an MPI-implementation? At first I thought it would just be a lot of non-blocking point-to-point communication, but I don't see a way to make it work that way. I also considered using the scatter-functionality, but I have too much troubles understanding how to use it correctly.
Could anybody help me to clear this out, please?
(I do understand basic C etc)
Thanks in advance
The first thing you need to ask yourself when converting a multi-threaded program to a distributed program is:
What am I trying to accomplish by distributing the data across multiple cores/nodes/etc.?
One of the most common issues people face when getting started with MPI is thinking that they can take a program that works well in a small, shared-memory environment (i.e. multi-threading on a single node) and throw more CPUs at it to make it faster.
Sometimes that is true, but often it's not. The most important thing to remember about MPI is that for the most part (unless you're getting into RMA, which is another advanced topic alltogether), each MPI process has its own separate memory, distinct from all other MPI processes. This is very different from a multi-threaded environment where all threads typically share memory. This means that you add a new problem on top of the other complexities you see with parallel programming. Now you have to consider how to make sure that the data you need to process is in the right place at the right time.
One common way to do this is to ensure that all of the data is already available to all of the other processes outside of MPI, for instance, through a shared filesystem. Then the processes can just figure out what work they should be doing, and get started with their data. Another way is for a single process, often rank 0, to send the important data to the appropriate ranks. There are obviously other ways that you've already discovered to optimize this process. MPI_SCATTER is a great example.
Just remember that it's not necessarily true that MPI is faster than multi-threading, which is faster than single-threading. In fact, sometimes it can be the opposite. The cost of moving your data around via MPI calls can be quite high. Make sure that it's what you actually want to do before trying to rewrite all of your code with MPI.
The only reason that people use MPI isn't just to speed up their code by taking advantage of more processors (though sometimes that's true). Sometimes it's because the problem that their application is trying to solve is too big to fit the in memory of a single node.
All that being said, if your problem really does map to MPI well, you can do what you want to do. Your application appears to be similar to a master/worker kind of job, which is relatively simple to deal with. Just have your master send non-blocking messages to your workers with their work and post a non-blocking MPI_ANY_SOURCE receive so it can be notified when the work is done. When it gets a message from the workers, send out more work to be done.
I am making a 2 player videogame, and the oponent's position gets updated on a thread, because it has a socket that is continuously listening. What I want to share is position and rotation.
As it is a videogame I don't want the main thread to be blocked (or be just the minimum time possible) and I don't want the performance to be affected. So from what I've seen to share this info the normal thing to do would be something like
class sharedinfo
{
public synchronized read();
public synchronized write();
}
but this would block the read in the main thread (the same that draws the videogame) until the three values (or even more info in the future are written) are written, and also I've read that synchronized is very expensive (also it is important to say this game is for android also, so performance is very important).
But I was thinking that maybe having sharedInfo inside an AtomicReference and eliminating synchronized would make it more efficient, because it would only stop when the reference itself is being updated (the write would not exist, I would create a new object and put it on the atomicreference), also they say that atomic* use hardware operations and are more efficient than synchronized.
What do you think?
Consider using a queue for this, Java has some nice concurrent queue implementations. Look up the BlockingQueue interface in java.util.concurrent, and who implements it. Chances are you fill find strategies implemented that you hadn't even considered.
Before you know it, you will want to communicate more than just positions between your threads, and with a queue you can stick different type of objects in there, maybe at different priorities, etc.
If in your code you use Interfaces (like Queue or BlockingQueue) as much as possible (i.e. anywhere but the place where the specific instance is constructed), it is really easy to swap out what exact type of Queue you are using, if you need different functionality, or just want to play around.
I have a piece of code that looks like this:
Algorithm a = null;
while(a == null)
{
a = grid.getAlgorithm();
}
getAlgorithm() in my Grid class returns some subtype of Algorithm depending on what the user chooses from some options.
My problem is that even after an algorithm is selected, the loop never terminates. However, that's not the tricky bit, if I simply place a System.out.println("Got here"); after my call to getAlgorithm(), the program runs perfectly fine and the loop terminates as intended.
My question is: why does adding that magic print statement suddenly make the loop terminate?
Moreover, this issue first came up when I started using my new laptop, I doubt that's related, but I figured it would be worth mentioning.
Edit: The program in question is NOT multithreaded. The code for getAlgorithm() is:
public Algorithm getAlgorithm ()
{
return algorithm;
}
Where algorithm is initially null, but will change value upon some user input.
I believe the issue has to deal with how grid.getAlgorithm is executed. If there is very little cost associated with executing the method, then your while loop will cycle very quickly as long the method continues to return null. That is often referred to as a busy wait.
Now it sounds like your new laptop is encountering a starvation issue which didn't manifest on your old computer. It is hard to say why but if you look at the link I included above, the Wikipedia article does indicate that busy waits do have unpredictable behavior. Maybe your old computer handles user IO better than your new laptop. Regardless, on your new laptop, that loop is taking resources away from whatever is handling your user IO hence it is starving the process that is responsible for breaking the loop.
You are doing active polling. This is a bad practice. You should at least let the polling thread sleep (with Thread.sleep). Since println does some io, it probably does just that. If your app is not multithreaded it is unlikely to work at all.
If this loop is to wait for user input in a GUI then ouch. Bad, bad idea and even with Thread.sleep() added I'd never recommend it. Instead, you most likely want to register an event listener on the component in question, and only have the validation code fire off when the contents change.
It's more than likely you're program is locking up because you've reached some form of deadlock more than anything else, especially if your application is multithreaded. Rather than try to solve this issue and hack your way round it, I'd seriously consider redesigning how this part of the application works.
You should check getAlgorithm(), there must be something wrong in the method.
There are two scenarios:
Your code is really not meant to be multi-threaded. In this case you need to insert some sort of user input in the loop. Otherwise you might as well leave it as Algorithm a = grid.getAlgorithm(); and prevent the infinite loop.
Your code is multi-threaded in which case you have some sort of 'visibility' problem. Go to Atomicity, Visibility and Ordering or read Java Concurrency in Practice to learn more about visibility. Essentially it means that without some sort of synchronization between threads, the thread you are looping in may never find out that the value has changed due to optimizations the JVM may perform.
You did not mention any context around how this code is run. If it is a console based application and you started from a 'main' function, you would know if there was multi-threading. I am assuming this is not the case since you say there is no multithreading. Another option would be that this is a swing application in which case you should read Multithreaded Swing Applications. It might be a web application in which case a similar case to swing might apply.
In any case you could always debug the application to see which thread is writing to the 'algorithm' variable, then see which thread is reading from it.
I hope this is helpful. In any case, you may find more help if you give a little more context in your question. Especially for a question with such an intriguing title as 'Weird Java problem, while loop termination'.
I have a Java program that runs many small simulations. It runs a genetic algorithm, where each fitness function is a simulation using parameters on each chromosome. Each one takes maybe 10 or so seconds if run by itself, and I want to run a pretty big population size (say 100?). I can't start the next round of simulations until the previous one has finished. I have access to a machine with a whack of processors in it and I'm wondering if I need to do anything to make the simulations run in parallel. I've never written anything explicitly for multicore processors before and I understand it's a daunting task.
So this is what I would like to know: To what extent and how well does the JVM parallel-ize? I have read that it creates low level threads, but how smart is it? How efficient is it? Would my program run faster if I made each simulation a thread? I know this is a huge topic, but could you point me towards some introductory literature concerning parallel processing and Java?
Thanks very much!
Update:
Ok, I've implemented an ExecutorService and made my small simulations implement Runnable and have run() methods. Instead of writing this:
Simulator sim = new Simulator(args);
sim.play();
return sim.getResults();
I write this in my constructor:
ExecutorService executor = Executors.newFixedThreadPool(32);
And then each time I want to add a new simulation to the pool, I run this:
RunnableSimulator rsim = new RunnableSimulator(args);
exectuor.exectue(rsim);
return rsim.getResults();
The RunnableSimulator::run() method calls the Simulator::play() method, neither have arguments.
I think I am getting thread interference, because now the simulations error out. By error out I mean that variables hold values that they really shouldn't. No code from within the simulation was changed, and before the simulation ran perfectly over many many different arguments. The sim works like this: each turn it's given a game-piece and loops through all the location on the game board. It checks to see if the location given is valid, and if so, commits the piece, and measures that board's goodness. Now, obviously invalid locations are being passed to the commit method, resulting in index out of bounds errors all over the place.
Each simulation is its own object right? Based on the code above? I can pass the exact same set of arguments to the RunnableSimulator and Simulator classes and the runnable version will throw exceptions. What do you think might cause this and what can I do to prevent it? Can I provide some code samples in a new question to help?
Java Concurrency Tutorial
If you're just spawning a bunch of stuff off to different threads, and it isn't going to be talking back and forth between different threads, it isn't too hard; just write each in a Runnable and pass them off to an ExecutorService.
You should skim the whole tutorial, but for this particular task, start here.
Basically, you do something like this:
ExecutorService executorService = Executors.newFixedThreadPool(n);
where n is the number of things you want running at once (usually the number of CPUs). Each of your tasks should be an object that implements Runnable, and you then execute it on your ExecutorService:
executorService.execute(new SimulationTask(parameters...));
Executors.newFixedThreadPool(n) will start up n threads, and execute will insert the tasks into a queue that feeds to those threads. When a task finishes, the thread it was running on is no longer busy, and the next task in the queue will start running on it. Execute won't block; it will just put the task into the queue and move on to the next one.
The thing to be careful of is that you really AREN'T sharing any mutable state between tasks. Your task classes shouldn't depend on anything mutable that will be shared among them (i.e. static data). There are ways to deal with shared mutable state (locking), but if you can avoid the problem entirely it will be a lot easier.
EDIT: Reading your edits to your question, it looks like you really want something a little different. Instead of implementing Runnable, implement Callable. Your call() method should be pretty much the same as your current run(), except it should return getResults();. Then, submit() it to your ExecutorService. You will get a Future in return, which you can use to test if the simulation is done, and, when it is, get your results.
You can also see the new fork join framework by Doug Lea. One of the best book on the subject is certainly Java Concurrency in Practice. I would strong recommend you to take a look at the fork join model.
Java threads are just too heavyweight. We have implement parallel branches in Ateji PX as very lightweight scheduled objects. As in Erlang, you can create tens of millions of parallel branches before you start noticing an overhead. But it's still Java, so you don't need to switch to a different language.
If you are doing full-out processing all the time in your threads, you won't benefit from having more threads than processors. If your threads occasionally wait on each other or on the system, then Java scales well up to thousands of threads.
I wrote an app that discovered a class B network (65,000) in a few minutes by pinging each node, and each ping had retries with an increasing delay. When I put each ping on a separate thread (this was before NIO, I could probably improve it now), I could run to about 4000 threads in windows before things started getting flaky. Linux the number was nearer 1000 (Never figured out why).
No matter what language or toolkit you use, if your data interacts, you will have to pay some attention to those areas where it does. Java uses a Synchronized keyword to prevent two threads from accessing a section at the same time. If you write your Java in a more functional manner (making all your members final) you can run without synchronization, but it can be--well let's just say solving problems takes a different approach that way.
Java has other tools to manage units of independent work, look in the "Concurrent" package for more information.
Java is pretty good at parallel processing, but there are two caveats:
Java threads are relatively heavyweight (compared with e.g. Erlang), so don't start creating them in the hundreds or thousands. Each thread gets its own stack memory (default: 256KB) and you could run out of memory, among other things.
If you run on a very powerful machine (especially with a lot of CPUs and a large amount of RAM), then the VM's default settings (especially concerning GC) may result in suboptimal performance and you may have to spend some times tuning them via command line options. Unfortunately, this is not a simple task and requires a lot of knowledge.