Limiting Thread Execution Processor Cycles in Java

Limiting Thread Execution Processor Cycles in Java - java

I'm writing an AI-testing Framework for a competition. Participants submit a Bot class which matches a given Interface. Then all the bots play a turn-based game. On every turn, I want to do the following:
For every bot B:
start a thread that runs at most N cycles and does B.getNextMove()
wait for all threads to complete
Make all moves (from each bot).
My difficulty comes in saying "at most N cycles". I can limit all the bots by time (say half a second per turn) but this means that some can get more processor cycles than others and doesn't permit a strict "your bot should be able to make its decision re: a turn in X time" requirement in the competition.
As stated, this is in Java. Any ideas? I've been looking at concurrency and locking but this doesn't feel like the right direction. Also, it's possible to not run the bots in Parralel and then use time for the restriction (given that the computer isn't running anything else at the time), but this would be undesirable as it would significantly slow down the speed at which we could have results from the games.

I'd make an interface with the bot to have them do 1 iteration of their algorithm,and do a simple count.
If you need hard time/cpu limits there arn't that many(easy) ways to manage that in java.
You can't measure cpu cycles with java, but you can measure CPU time - which is a vast improvement over using just wall clock time.
To get the cpu time of the current thread you'd use (from the standard java.lang.management package)
ThreadMXBean tm = ManagementFactory.getThreadMXBean();
long cpuTime = tm.getCurrentThreadCpuTime();

Since you're controlling the bot execution and explicitly calling next yourself, why not just count the iterations? e.g.
public class Botcaller extends Thread
{
private Bot bot;
int cycles_completed;
public static final int MAX_ALLOWED_CYCLES=...;
public void run()
{
while (cycles_completed <MAX_ALLOWED_CYCLES)
{
bot.move;
cycles_completed++;
yield()
}
}
}

I think it will be very difficult to control threads at this low a level from Java. I no of know way to stop a thread safely that is running.
Another option is to let each bot run in whatever time it wants. Have one thread per bot but poll the location in a master thread every second. If they haven't updated location in that "round" then they miss out and have to wait for the next one. Make sure that the world state that each bot sees is the one controlled by the master thread.
The nice thing about this approach is that it lets bots sacrifice rounds if they want to do something more complicated.

I think Robocode does something similar... you may want to look there.

Related

Trigger CPU cache write back manually in java: possible? necessary?

I am writing a video game in my spare time and have a question about data consistency when introducing mult-threading.
At the moment my game is single threaded and has a simple game loop as it is taught in many tutorials:
while game window is not closed
{
poll user input
react to user input
update game state
render game objects
flip buffers
}
I now want to add a new feature to my game where the player can automate certain tasks that are long and tedious, like walking long distances (fast travel). I may chose to simply "teleport" the player character to their destination but I would prefer not to. Instead, the game will be sped up and the player character will actually walk as if the player was doing it manually. The benefit of this is that the game world will interact with the player character as usual and any special events that might happen will still happen and immediately stop the fast travel.
To implement this feature I was thinking about something like this:
Start a new thread (worker thread) and have that thread update the game state continuously until the player character reaches its destination
Have the main thread no longer update the game state and render the games objects as usual and instead display the travel progress in a more simplistic manner
Use a synchronized message queue to have the main thread and the worker thread communicate
When the fast travel is finished or canceled (by player interaction or other reasons) have the worker thread die and resume the standard game loop with the main thread
In pseudo code it may look like this:
[main thread]
while game window is not closed
{
poll user input
if user wants to cancel fast travel
{
write to message queue player input "cancel"
}
poll message queue about fast travel status
if fast travel finished or canceled
{
resume regular game loop
} else {
render travel status
flip buffers
}
}
[worker thread]
while (travel ongoing)
{
poll message queue
if user wants to cancel fast travel
{
write to message queue fast travel status "canceled"
return
}
update game state
if fast travel is interrupted by internal game event
{
write to message queue fast travel status "canceled"
return
}
write to message queue fast travel status "ongoing"
}
if travel was finished
{
write to message queue fast travel status "finished"
}
The message queue will be some kind of two-channeled synchronized data structure. Maybe two ArrayDeque's with a Lock for each. I am fairly certain this will not be too much trouble.
What I am more concerned is caching problems with the game data:
1.a) Could it be that the worker thread, after being started, may see old game data because the main thread may run on a different core which has cached some of its results?
1.b) If the above is true: Would I need to declare every single field in the game data as volatile to protect myself with absolute guarantee against inconsistent data?
2) Am I right to assume that performance would take a non trivial hit if all fields are volatile?
3) Since I only need to pass the data between threads at few and well controlled points in time, would it be possible to force all caches to write back to main memory instead of using volatile fields?
4) Is there a better approach? Is my concept perhaps ill conceived?
Thanks for any help and sorry for the big chunk of text. I thought it would be easier to answer the question if you know the intended use.

Since I only need to pass the data between threads at few and well controlled points in time, would it be possible to force all caches to write back to main memory instead of using volatile fields?
No. That's not how any of this works. Let me give you very short answers to explain why you are thinking about this the wrong way:
1.a) Could it be that the worker thread, after being started, may see old game data because the main thread may run on a different core which has cached some of its results?
Sure. Or it might for some other reason. Memory visibility is not guaranteed, so you can't rely on it unless you use something guaranteed to provide memory visilbity.
1.b) If the above is true: Would I need to declare every single field in the game data as volatile to protect myself with absolute guarantee against inconsistent data?
No. Any method of assuring memory visibility will work. You don't have to do it any particular way.
2) Am I right to assume that performance would take a non trivial hit if all fields are volatile?
Probably. This would probably be the worst possible way to do it.
3) Since I only need to pass the data between threads at few and well controlled points in time, would it be possible to force all caches to write back to main memory instead of using volatile fields?
No. Since there is no "write cache back to memory" operation that assures memory visibility. Your platform may not even have such caches and the issue might be something else entirely. You're writing Java code, you don't have to think about how your particular CPU works, what cores or caches it has, or anything like that. That's one of the big advantages of using a language with semantics that are guaranteed and don't talk about cores, caches, or anything like this.
4) Is there a better approach? Is my concept perhaps ill conceived?
Absolutely. You are writing Java code. Use the various Java synchronization classes and functions and rely on them to prove the semantics they're documented to provide. Don't even think about cores, caches, flushing to memory, or anything like that. Those are hardware details that, as a Java programmer, you don't even have to ever think about.
Any Java documentation you see that talks about cores, caches, or flushes to memory is not actually talking about real cores, caches, or flushes to memory. It's just giving you some ways to think about hypothetical hardware so you can wrap your brain around why memory visibility and total ordering don't always work perfectly just by themselves. Your real CPU or platform may have completely different issues that bear no resemblance to this hypothetical hardware. (And real-world CPUs and systems have cache coherency guaranteed by hardware and their visibility/ordering issues in fact are completely different!)

Ensuring that threads get (approximately) equal CPU time in Java

I'm writing a game in which players write AI agents that compete against one another, on the JVM. Right now the architecture looks like this:
A core server module that handles the physics simulations, and takes messages from the players as input to alter the world. The core also determines what the world looks like from the perspective of each of the players, based on various rules (think fog of war).
Player modules receive updated versions of the world from the core, process them, and stream messages to the core as inputs based on that processing.
The idea is that the core is compiled along with two player modules, and then the simulation is run producing an output stream that can be played back to generate visualization of the match.
My question is, if each of the players runs on a single Java thread, is it possible to ensure that the two player threads get equal amounts of resources (CPU time, primarily, I think)? Because I don't control the nature of the processing that each AI is doing, it's possible that one of the players might be extremely inefficient but written in such a way that its thread consumes so many resources the other player's AI is resource starved and can't compete fairly.
I get the feeling that this isn't possible without a hard realtime OS, which the JVM isn't even close to being, but if there's even a way to get reasonably close I'd love to explore it.

"Player modules receive updated versions of the world from the core, process them, and stream messages to the
core as inputs based on that processing". This means that player module has a loop inside it which receives update message and sends result messages to the core. Then I would use lightweight actor model, each player being an actor, and all actors use the same ExecutorService. Since activated actors go through the same executor task queue, they got roughly the same access to CPU.

Your intuition is right that this isn't really possible in Java. Even if you had a real-time OS, someone could still write a very resource intensive AI thread.
There are a couple of approaches you could take to at least help here. First be sure to give the two player module threads the same priority. If you are running on a machine that has more than 2 processors, and you set each of the player module threads to have the highest priority, then theoretically they should both run whenever they have something to do. But if there's nothing to stop the player modules from spawning new threads themselves, then you can't guarantee a player won't do that.
So short answer is no, you can't make these guarantees in java.
Depending on how your simulation works, maybe you can have a concept of "turns". So the simulation instructs player 1 to make a move, then player 2 makes its move, and back and forth ,so they can each only make one "move" at a time. Not sure if this will work in your situation though.

If you have any knobs to turn regarding how much work the threads have to do (or just set their priority), you can set up another thread that periodically monitors threads using ThreadMXBeans and find their CPU usage using ThreadInfo.getThreadCpuTime. You can then compare each players CPU time and react accordingly.
Not sure if this is timely and accurate enough for you, but over time you could balance the CPU usage.
However, splitting the work in packets and using Executors like suggested before should be the better way and more java-like.

Android game loop timing discrepancy [duplicate]

When programming animations and little games I've come to know the incredible importance of Thread.sleep(n); I rely on this method to tell the operating system when my application won't need any CPU, and using this making my program progress in a predictable speed.
My problem is that the JRE uses different methods of implementation of this functionality on different operating systems. On UNIX-based (or influenced) OS:es such as Ubuntu and OS X, the underlying JRE implementation uses a well-functioning and precise system for distributing CPU-time to different applications, and so making my 2D game smooth and lag-free. However, on Windows 7 and older Microsoft systems, the CPU-time distribution seems to work differently, and you usually get back your CPU-time after the given amount of sleep, varying with about 1-2 ms from target sleep. However, you get occasional bursts of extra 10-20 ms of sleep time. This causes my game to lag once every few seconds when this happens. I've noticed this problem exists on most Java games I've tried on Windows, Minecraft being a noticeable example.
Now, I've been looking around on the Internet to find a solution to this problem. I've seen a lot of people using only Thread.yield(); instead of Thread.sleep(n);, which works flawlessly at the cost of the currently used CPU core getting full load, no matter how much CPU your game actually needs. This is not ideal for playing your game on laptops or high energy consumption workstations, and it's an unnecessary trade-off on Macs and Linux systems.
Looking around further I found a commonly used method of correcting sleep time inconsistencies called "spin-sleep", where you only order sleep for 1 ms at a time and check for consistency using the System.nanoTime(); method, which is very accurate even on Microsoft systems. This helps for the normal 1-2 ms of sleep inconsistency, but it won't help against the occasional bursts of +10-20 ms of sleep inconsistency, since this often results in more time spent than one cycle of my loop should take all together.
After tons of looking I found this cryptic article of Andy Malakov, which was very helpful in improving my loop: http://andy-malakov.blogspot.com/2010/06/alternative-to-threadsleep.html
Based on his article I wrote this sleep method:
// Variables for calculating optimal sleep time. In nanoseconds (1s = 10^-9ms).
private long timeBefore = 0L;
private long timeSleepEnd, timeLeft;
// The estimated game update rate.
private double timeUpdateRate;
// The time one game loop cycle should take in order to reach the max FPS.
private long timeLoop;
private void sleep() throws InterruptedException {
// Skip first game loop cycle.
if (timeBefore != 0L) {
// Calculate optimal game loop sleep time.
timeLeft = timeLoop - (System.nanoTime() - timeBefore);
// If all necessary calculations took LESS time than given by the sleepTimeBuffer. Max update rate was reached.
if (timeLeft > 0 && isUpdateRateLimited) {
// Determine when to stop sleeping.
timeSleepEnd = System.nanoTime() + timeLeft;
// Sleep, yield or keep the thread busy until there is not time left to sleep.
do {
if (timeLeft > SLEEP_PRECISION) {
Thread.sleep(1); // Sleep for approximately 1 millisecond.
}
else if (timeLeft > SPIN_YIELD_PRECISION) {
Thread.yield(); // Yield the thread.
}
if (Thread.interrupted()) {
throw new InterruptedException();
}
timeLeft = timeSleepEnd - System.nanoTime();
}
while (timeLeft > 0);
}
// Save the calculated update rate.
timeUpdateRate = 1000000000D / (double) (System.nanoTime() - timeBefore);
}
// Starting point for time measurement.
timeBefore = System.nanoTime();
}
SLEEP_PRECISION I usually put to about 2 ms, and SPIN_YIELD_PRECISION to about 10 000 ns for best performance on my Windows 7 machine.
After tons of hard work, this is the absolute best I can come up with. So, since I still care about improving the accuracy of this sleep method, and I'm still not satisfied with the performance, I would like to appeal to all of you java game hackers and animators out there for suggestions on a better solution for the Windows platform. Could I use a platform-specific way on Windows to make it better? I don't care about having a little platform specific code in my applications, as long as the majority of the code is OS independent.
I would also like to know if there is anyone who knows about Microsoft and Oracle working out a better implementation of the Thread.sleep(n); method, or what's Oracle's future plans are on improving their environment as the basis of applications requiring high timing accuracy, such as music software and games?
Thank you all for reading my lengthy question/article. I hope some people might find my research helpful!

You could use a cyclic timer associated with a mutex. This is IHMO the most efficient way of doing what you want. But then you should think about skipping frames in case the computer lags (You can do it with another nonblocking mutex in the timer code.)
Edit: Some pseudo-code to clarify
Timer code:
While(true):
if acquireIfPossible(mutexSkipRender):
release(mutexSkipRender)
release(mutexRender)
Sleep code:
acquire(mutexSkipRender)
acquire(mutexRender)
release(mutexSkipRender)
Starting values:
mutexSkipRender = 1
mutexRender = 0
Edit: corrected initialization values.
The following code work pretty well on windows (loops at exactly 50fps with a precision to the millisecond)
import java.util.Date;
import java.util.Timer;
import java.util.TimerTask;
import java.util.concurrent.Semaphore;
public class Main {
public static void main(String[] args) throws InterruptedException {
final Semaphore mutexRefresh = new Semaphore(0);
final Semaphore mutexRefreshing = new Semaphore(1);
int refresh = 0;
Timer timRefresh = new Timer();
timRefresh.scheduleAtFixedRate(new TimerTask() {
#Override
public void run() {
if(mutexRefreshing.tryAcquire()) {
mutexRefreshing.release();
mutexRefresh.release();
}
}
}, 0, 1000/50);
// The timer is started and configured for 50fps
Date startDate = new Date();
while(true) { // Refreshing loop
mutexRefresh.acquire();
mutexRefreshing.acquire();
// Refresh
refresh += 1;
if(refresh % 50 == 0) {
Date endDate = new Date();
System.out.println(String.valueOf(50.0*1000/(endDate.getTime() - startDate.getTime())) + " fps.");
startDate = new Date();
}
mutexRefreshing.release();
}
}
}

Your options are limited, and they depend on what exactly you want to do. Your code snippet mentions the max FPS, but the max FPS would require that you never sleep at all, so I'm not entirely sure what you intend with that. None of that sleep or yield checking is going to make any difference in most of the problem situations however - if some other app needs to run now and the OS doesn't want to switch back soon, it doesn't matter which one of those you call, you'll get control back when the OS decides to do so, which will almost certainly be more than 1ms in the future. However, the OS can certainly be coaxed into making switches more often - Win32 has the timeBeginPeriod call for precisely this purpose, which you may be able to use somehow. But there is a good reason for not switching too often - it's less efficient.
The best thing to do, although somewhat more complex, is usually to go for a game loop that doesn't require real-time updates, but instead performs logic updates at fixed intervals (eg. 20x a second) and renders whenever possible (perhaps with arbitrary short sleeps to free up CPU for other apps, if not running in full-screen). By buffering a past logic state as well as the current one you can interpolate between them to make the rendering appear as smooth as if you were doing logic updates each time. For more information on this approach, you can see the Fix Your Timestep article.
I would also like to know if there is anyone who knows about Microsoft and Oracle working out a better implementation of the Thread.sleep(n); method, or what's Oracle's future plans are on improving their environment as the basis of applications requiring high timing accuracy, such as music software and games?
No, this won't be happening. Remember, sleep is just a method saying how long you want your program to be asleep for. It is not a specification for when it will or should wake up, and never will be. By definition, any system with sleep and yield functionality is a multitasking system, where the requirements of other tasks have to be considered, and the operating system always gets the final call on the scheduling of this. The alternative wouldn't work reliably, because if a program could somehow demand to be reactivated at a precise time of its choosing it could starve other processes of CPU power. (eg. A program that spawned a background thread and had both threads performing 1ms of work and calling sleep(1) at the end could take turns to hog a CPU core.) Thus, for a user-space program, sleep (and functionality like it) will always be a lower bound, never an upper bound. To do better than that requires the OS itself to allow certain apps to pretty much own the scheduling, and this is not a desirable feature in operating systems for consumer hardware (while being a common and useful feature for industrial applications).

Thread.Sleep says you're app needs no more time. This means that in a worst case scenario you'll have to wait for an entire thread slice (40ms or so).
Now in bad cases when a driver or something takes up more time it could be you have to wait for 120ms (3*40ms) so Thread.Sleep is not the way to go. Go another way, like registering a 1ms callback and starting draw code very X callbacks.
(This is on windows, i'd use MultiMedia tools to get those 1ms resolution callbacks)

Timing stuff is notoriously bad on windows. This article is a good place to start. Not sure if you care, but also note that there can be worse problems (especially with System.nanoTime) on virtual systems as well (when windows is the guest operating system).

Thread.sleep is inaccurate and makes the animation jittery most of the time.
If you replace it completely with Thread.yield you'll get a solid FPS without lag or jitter, however the CPU usage increases greatly. I moved to Thread.yield a long time ago.
This problem has been discussed on Java Game Development forums for years.

Investigation of optimal sleep time calculation in game loop

When programming animations and little games I've come to know the incredible importance of Thread.sleep(n); I rely on this method to tell the operating system when my application won't need any CPU, and using this making my program progress in a predictable speed.
My problem is that the JRE uses different methods of implementation of this functionality on different operating systems. On UNIX-based (or influenced) OS:es such as Ubuntu and OS X, the underlying JRE implementation uses a well-functioning and precise system for distributing CPU-time to different applications, and so making my 2D game smooth and lag-free. However, on Windows 7 and older Microsoft systems, the CPU-time distribution seems to work differently, and you usually get back your CPU-time after the given amount of sleep, varying with about 1-2 ms from target sleep. However, you get occasional bursts of extra 10-20 ms of sleep time. This causes my game to lag once every few seconds when this happens. I've noticed this problem exists on most Java games I've tried on Windows, Minecraft being a noticeable example.
Now, I've been looking around on the Internet to find a solution to this problem. I've seen a lot of people using only Thread.yield(); instead of Thread.sleep(n);, which works flawlessly at the cost of the currently used CPU core getting full load, no matter how much CPU your game actually needs. This is not ideal for playing your game on laptops or high energy consumption workstations, and it's an unnecessary trade-off on Macs and Linux systems.
Looking around further I found a commonly used method of correcting sleep time inconsistencies called "spin-sleep", where you only order sleep for 1 ms at a time and check for consistency using the System.nanoTime(); method, which is very accurate even on Microsoft systems. This helps for the normal 1-2 ms of sleep inconsistency, but it won't help against the occasional bursts of +10-20 ms of sleep inconsistency, since this often results in more time spent than one cycle of my loop should take all together.
After tons of looking I found this cryptic article of Andy Malakov, which was very helpful in improving my loop: http://andy-malakov.blogspot.com/2010/06/alternative-to-threadsleep.html
Based on his article I wrote this sleep method:
// Variables for calculating optimal sleep time. In nanoseconds (1s = 10^-9ms).
private long timeBefore = 0L;
private long timeSleepEnd, timeLeft;
// The estimated game update rate.
private double timeUpdateRate;
// The time one game loop cycle should take in order to reach the max FPS.
private long timeLoop;
private void sleep() throws InterruptedException {
// Skip first game loop cycle.
if (timeBefore != 0L) {
// Calculate optimal game loop sleep time.
timeLeft = timeLoop - (System.nanoTime() - timeBefore);
// If all necessary calculations took LESS time than given by the sleepTimeBuffer. Max update rate was reached.
if (timeLeft > 0 && isUpdateRateLimited) {
// Determine when to stop sleeping.
timeSleepEnd = System.nanoTime() + timeLeft;
// Sleep, yield or keep the thread busy until there is not time left to sleep.
do {
if (timeLeft > SLEEP_PRECISION) {
Thread.sleep(1); // Sleep for approximately 1 millisecond.
}
else if (timeLeft > SPIN_YIELD_PRECISION) {
Thread.yield(); // Yield the thread.
}
if (Thread.interrupted()) {
throw new InterruptedException();
}
timeLeft = timeSleepEnd - System.nanoTime();
}
while (timeLeft > 0);
}
// Save the calculated update rate.
timeUpdateRate = 1000000000D / (double) (System.nanoTime() - timeBefore);
}
// Starting point for time measurement.
timeBefore = System.nanoTime();
}
SLEEP_PRECISION I usually put to about 2 ms, and SPIN_YIELD_PRECISION to about 10 000 ns for best performance on my Windows 7 machine.
After tons of hard work, this is the absolute best I can come up with. So, since I still care about improving the accuracy of this sleep method, and I'm still not satisfied with the performance, I would like to appeal to all of you java game hackers and animators out there for suggestions on a better solution for the Windows platform. Could I use a platform-specific way on Windows to make it better? I don't care about having a little platform specific code in my applications, as long as the majority of the code is OS independent.
I would also like to know if there is anyone who knows about Microsoft and Oracle working out a better implementation of the Thread.sleep(n); method, or what's Oracle's future plans are on improving their environment as the basis of applications requiring high timing accuracy, such as music software and games?
Thank you all for reading my lengthy question/article. I hope some people might find my research helpful!

You could use a cyclic timer associated with a mutex. This is IHMO the most efficient way of doing what you want. But then you should think about skipping frames in case the computer lags (You can do it with another nonblocking mutex in the timer code.)
Edit: Some pseudo-code to clarify
Timer code:
While(true):
if acquireIfPossible(mutexSkipRender):
release(mutexSkipRender)
release(mutexRender)
Sleep code:
acquire(mutexSkipRender)
acquire(mutexRender)
release(mutexSkipRender)
Starting values:
mutexSkipRender = 1
mutexRender = 0
Edit: corrected initialization values.
The following code work pretty well on windows (loops at exactly 50fps with a precision to the millisecond)
import java.util.Date;
import java.util.Timer;
import java.util.TimerTask;
import java.util.concurrent.Semaphore;
public class Main {
public static void main(String[] args) throws InterruptedException {
final Semaphore mutexRefresh = new Semaphore(0);
final Semaphore mutexRefreshing = new Semaphore(1);
int refresh = 0;
Timer timRefresh = new Timer();
timRefresh.scheduleAtFixedRate(new TimerTask() {
#Override
public void run() {
if(mutexRefreshing.tryAcquire()) {
mutexRefreshing.release();
mutexRefresh.release();
}
}
}, 0, 1000/50);
// The timer is started and configured for 50fps
Date startDate = new Date();
while(true) { // Refreshing loop
mutexRefresh.acquire();
mutexRefreshing.acquire();
// Refresh
refresh += 1;
if(refresh % 50 == 0) {
Date endDate = new Date();
System.out.println(String.valueOf(50.0*1000/(endDate.getTime() - startDate.getTime())) + " fps.");
startDate = new Date();
}
mutexRefreshing.release();
}
}
}

Thread.Sleep says you're app needs no more time. This means that in a worst case scenario you'll have to wait for an entire thread slice (40ms or so).
Now in bad cases when a driver or something takes up more time it could be you have to wait for 120ms (3*40ms) so Thread.Sleep is not the way to go. Go another way, like registering a 1ms callback and starting draw code very X callbacks.
(This is on windows, i'd use MultiMedia tools to get those 1ms resolution callbacks)

Timing stuff is notoriously bad on windows. This article is a good place to start. Not sure if you care, but also note that there can be worse problems (especially with System.nanoTime) on virtual systems as well (when windows is the guest operating system).

Thread.sleep is inaccurate and makes the animation jittery most of the time.
If you replace it completely with Thread.yield you'll get a solid FPS without lag or jitter, however the CPU usage increases greatly. I moved to Thread.yield a long time ago.
This problem has been discussed on Java Game Development forums for years.

How good is the JVM at parallel processing? When should I create my own Threads and Runnables? Why might threads interfere?

I have a Java program that runs many small simulations. It runs a genetic algorithm, where each fitness function is a simulation using parameters on each chromosome. Each one takes maybe 10 or so seconds if run by itself, and I want to run a pretty big population size (say 100?). I can't start the next round of simulations until the previous one has finished. I have access to a machine with a whack of processors in it and I'm wondering if I need to do anything to make the simulations run in parallel. I've never written anything explicitly for multicore processors before and I understand it's a daunting task.
So this is what I would like to know: To what extent and how well does the JVM parallel-ize? I have read that it creates low level threads, but how smart is it? How efficient is it? Would my program run faster if I made each simulation a thread? I know this is a huge topic, but could you point me towards some introductory literature concerning parallel processing and Java?
Thanks very much!
Update:
Ok, I've implemented an ExecutorService and made my small simulations implement Runnable and have run() methods. Instead of writing this:
Simulator sim = new Simulator(args);
sim.play();
return sim.getResults();
I write this in my constructor:
ExecutorService executor = Executors.newFixedThreadPool(32);
And then each time I want to add a new simulation to the pool, I run this:
RunnableSimulator rsim = new RunnableSimulator(args);
exectuor.exectue(rsim);
return rsim.getResults();
The RunnableSimulator::run() method calls the Simulator::play() method, neither have arguments.
I think I am getting thread interference, because now the simulations error out. By error out I mean that variables hold values that they really shouldn't. No code from within the simulation was changed, and before the simulation ran perfectly over many many different arguments. The sim works like this: each turn it's given a game-piece and loops through all the location on the game board. It checks to see if the location given is valid, and if so, commits the piece, and measures that board's goodness. Now, obviously invalid locations are being passed to the commit method, resulting in index out of bounds errors all over the place.
Each simulation is its own object right? Based on the code above? I can pass the exact same set of arguments to the RunnableSimulator and Simulator classes and the runnable version will throw exceptions. What do you think might cause this and what can I do to prevent it? Can I provide some code samples in a new question to help?

Java Concurrency Tutorial
If you're just spawning a bunch of stuff off to different threads, and it isn't going to be talking back and forth between different threads, it isn't too hard; just write each in a Runnable and pass them off to an ExecutorService.
You should skim the whole tutorial, but for this particular task, start here.
Basically, you do something like this:
ExecutorService executorService = Executors.newFixedThreadPool(n);
where n is the number of things you want running at once (usually the number of CPUs). Each of your tasks should be an object that implements Runnable, and you then execute it on your ExecutorService:
executorService.execute(new SimulationTask(parameters...));
Executors.newFixedThreadPool(n) will start up n threads, and execute will insert the tasks into a queue that feeds to those threads. When a task finishes, the thread it was running on is no longer busy, and the next task in the queue will start running on it. Execute won't block; it will just put the task into the queue and move on to the next one.
The thing to be careful of is that you really AREN'T sharing any mutable state between tasks. Your task classes shouldn't depend on anything mutable that will be shared among them (i.e. static data). There are ways to deal with shared mutable state (locking), but if you can avoid the problem entirely it will be a lot easier.
EDIT: Reading your edits to your question, it looks like you really want something a little different. Instead of implementing Runnable, implement Callable. Your call() method should be pretty much the same as your current run(), except it should return getResults();. Then, submit() it to your ExecutorService. You will get a Future in return, which you can use to test if the simulation is done, and, when it is, get your results.

You can also see the new fork join framework by Doug Lea. One of the best book on the subject is certainly Java Concurrency in Practice. I would strong recommend you to take a look at the fork join model.

Java threads are just too heavyweight. We have implement parallel branches in Ateji PX as very lightweight scheduled objects. As in Erlang, you can create tens of millions of parallel branches before you start noticing an overhead. But it's still Java, so you don't need to switch to a different language.

If you are doing full-out processing all the time in your threads, you won't benefit from having more threads than processors. If your threads occasionally wait on each other or on the system, then Java scales well up to thousands of threads.
I wrote an app that discovered a class B network (65,000) in a few minutes by pinging each node, and each ping had retries with an increasing delay. When I put each ping on a separate thread (this was before NIO, I could probably improve it now), I could run to about 4000 threads in windows before things started getting flaky. Linux the number was nearer 1000 (Never figured out why).
No matter what language or toolkit you use, if your data interacts, you will have to pay some attention to those areas where it does. Java uses a Synchronized keyword to prevent two threads from accessing a section at the same time. If you write your Java in a more functional manner (making all your members final) you can run without synchronization, but it can be--well let's just say solving problems takes a different approach that way.
Java has other tools to manage units of independent work, look in the "Concurrent" package for more information.

Java is pretty good at parallel processing, but there are two caveats:
Java threads are relatively heavyweight (compared with e.g. Erlang), so don't start creating them in the hundreds or thousands. Each thread gets its own stack memory (default: 256KB) and you could run out of memory, among other things.
If you run on a very powerful machine (especially with a lot of CPUs and a large amount of RAM), then the VM's default settings (especially concerning GC) may result in suboptimal performance and you may have to spend some times tuning them via command line options. Unfortunately, this is not a simple task and requires a lot of knowledge.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.