Java Event Generation - java

I want to design a system which will generate a specific event at a constant rate and this will continue doing in the background. In the foreground it give output of some other events if I want.
But the background event will not stop. What is the best way to achieve it in java?

This is the definition of Threading and it needs to come with some level of understanding.
On a simplest level, make a Thread that sleeps for an amount of time then executes your code. There are lots of other ways to do it, but few are shorter than just overriding the run method of a thread.
If you want something more abstract, look through the concurrent package in the Java docs, there are many methods that do exactly what you want, and java.util.timer is a good one to look at as well.
Be aware of variables and collections that might be accessed by different threads at the same time. Also be aware if you have a GUI that you shuold not update your GUI from this new thread.
Edit to add a Non-thread solution
(I don't think this is really what you want, but in the comments you asked for a non-threaded solution).
If you wish to do this without threads (meaning you really wish to do it in your current thread) you have to occasionally "Interrupt" your current thread to check to see if your other task needs to process. First you need a method like this:
long lastRun=System.currentTimeInMillis();
final long howOftenToRun=60*1000 // every minute
testForBackgroundTask() {
if(lastRun + howOftenToRun < System.currentTimeInMillis()) {
// This will drift, if you don't want drift use lastRun+=howOftenToRun
lastRun=System.currentTimeInMillis()
// this is where your occasional task is.
// The task could be in-line here but of course that would violate the SRP
runBackgroundTask()
}
}
After that, you need to sprinkle testForBackgroundTask throughout your code:
lotsOfStuff....
testForBackgroundTask()
longMethod()
testForBackgroundTask()
morestuff...
testForBackgroundTask()
...
Note that if longMethod() takes a really long time then you will need to put calls to testForBackgroundTask() inside it as well.
I know this is ugly, and the uglyness of this solution is why threads are used. The only advantage is that it will absolutely prevent threading conflicts.
The other single threaded solution--making your code event driven--is even harder and seriously impacts your code (There is a construct called a Finite State Engine made for this purpose).

Related

Java Multithreading - More Threads That Do Less, or Fewer Threads that Do More?

EDIT: This question might be appropriate for other languages as well - the overall theory behind it seems mostly language agnostic. However, as this will run in a JVM, I'm sure there's differences between JVM overheads/threading and those of other environments.
EDIT 2: To clarify a little better, I guess the main question is which is better for scalability: to have smaller threads that can return quicker to enable processing other chunks of work for other workloads, or try to get a single workload through as quickly as possible? The workloads are sequential and multithreading won't help speed up a single unit of work in this case; it's more in hopes of increasing the throughput of the system overall (thanks to Uri for leading me towards the clarification).
I'm working on a system that's replacing an existing system; the current system has a pretty heavy load, so we already know the replacement needs to be highly scalable. It communicates with several outside processes, such as email, other services, databases, etc., and I'm already planning on making it multithreaded to help with scaling. I've worked on multithreaded apps before, just nothing with this high of a performance/scalability requirement, so I don't have much experience when it comes to getting the absolute most out of concurrency.
The question I have is what's the best way to divide the work up between threads? I'm looking at two different versions, one that creates a single thread for the full workflow, and another that creates a thread for each of the individual steps, continuing on to the next step (in a new/different thread) when the previous step completes - probably with a NodeJS-style callback system, but not terribly concerned about the direct implementation details.
I don't know much about the nitty-gritty details of multithreading - things like context switching, for example - so I don't know if the overhead of multiple threads would swamp the execution time in each of the threads. On one hand, the single thread model seems like it would be fastest for an individual work flow compared to the multiple threads; however, it would also tie up a single thread for the entire workflow, whereas the multiple threads would be shorter lived and would return to the pool quicker (I imagine, at least).
Hopefully the underlying concept is easy enough to understand; here's a contrived pseudo-code example though:
// Single-thread approach
foo();
bar();
baz();
Or:
// Multiple Thread approach
Thread.run(foo);
when foo.isDone()
Thread.run(bar);
when bar.isDone()
Thread.run(baz);
UPDATE: Completely forgot. The reason I'm considering the multithreaded approach is the (possibly mistaken) belief that, since the threads will have smaller execution times, they'll be available for other instances of the overall workload. If each operation takes, say 5 seconds, then the single-thread version locks up a thread for 15 seconds; the multiple thread version would lock up a single thread for 5 seconds, and then it can be used for another process.
Any ideas? If there's anything similar out there in the interwebs, I'd love even a link - I couldn't think of how to search for this (I blame Monday for that, but it would probably be the same tomorrow).
Multithreading is not a silver bullet. It's means to an end.
Before making any changes, you need to ask yourself where your bottlenecks are, and what you're really trying to parallelize. I'm not sure that without more information that we can give good advice here.
If foo, bar, and baz are part of a pipeline, you're not necessarily going to improve the overall latency of a single sequence by using multiple threads.
What you might be able to do is to increase your throughput by letting multiple executions of the pipeline over different input pieces work in parallel, by letting later items to travel through the pipeline while earlier items are blocked on something (e.g., I/O). For instance, if bar() for a particular input is blocked and waiting on a notification, it's possible that you could do computationally heavy operations on another input, or have CPU resources to devote to foo(). A particularly important question is whether any of the external dependencies act as a limited shared resource. e.g., if one thread is accessing system X, is another thread going to be affected?
Threads are also very effective if you want to divide and conquer your problem - splitting your input into smaller parts, running each part through the pipeline, and then waiting on all the pieces to be ready. Is that possible with the kind of workflow you're looking at?
If you need to first do foo, then do bar, and then do baz, you should have one thread do each of these steps in sequence. This is simple and makes obvious sense.
The most common case where you're better off with the assembly line approach is when keeping the code in cache is more important than keeping the data in cache. In this case, having one thread that does foo over and over can keep the code for this step in cache, keep branch prediction information around, and so on. However, you will have data cache misses when you hand the results of foo to the thread that does bar.
This is more complex and should only be attempted if you have good reason to think it will work better.
Use a single thread for the full workflow.
Dividing up the workflow can't improve the completion time for one piece of work: since the parts of the workflow have to be done sequentially anyway, only one thread can work on the piece of work at a time. However, breaking up the stages can delay the completion time for one piece of work, because a processor which could have picked up the last part of one piece of work might instead pick up the first part of another piece of work.
Breaking up the stages into multiple threads is also unlikely to improve the time to completion of all your work, relative to executing all the stages in one thread, since ultimately you still have to execute all the stages for all the pieces of work.
Here's an example. If you have 200 of these pieces of work, each requiring three 5 second stages, and say a thread pool of two threads running on two processors, keeping the entire workflow in a single thread results in your first two results after 15 seconds. It will take 1500 seconds to get all your results, but you only need the working memory for two of the pieces of work at a time. If you break up the stages, then it may take a lot longer than 15 seconds to get your first results, and you potentially may need memory for all 200 pieces of work proceeding in parallel if you still want to get all the results in 1500 seconds.
In most cases, there are no efficiency advantages to breaking up sequential stages into different threads, and there may be substantial disadvantages. Threads are generally only useful when you can use them to do work in parallel, which does not seem to be the case for your work stages.
However, there is a huge disadvantage to breaking up the stages into separate threads. That disadvantage is that you now need to write multithreaded code that manages the stages. It's extremely easy to write bugs in such code, and such bugs can be very difficult to catch prior to production deployment.
The way to avoid such bugs is to keep the threading code as simple as possible given your requirements. In the case of your work stages, the simplest possible threading code is none at all.

Other ways to perform tasks without loops?

I'm fairly new to java and I was creating a program which would run indefinitely. Currently, the way I have the program set up is calling a certain method which would perform a task then call another method in the same class, this method would perform a task then call the initial method. This process would repeat indefinitely until I stop the compiler.
My problem is when I try to create a GUI to make my program more user friendly, once I press the initial start button this infinite loop will not allow me to perform any other actions -- including stopping the program.
There has to be another way to do this?
I apologize if this method is extremely sloppy, I sort of taught myself java from videos and looking at other programs and don't entirely understand it yet.
You'll need to run your task in a new thread, and have your GUI stuff in another thread.
Actually, if you keep working on this problem, you'll eventually invent event driven programming. Lots of GUI based software, like Android, use this paradigm.
There are several solutions. The first that comes to mind is that you could put whatever method needs to run forever in its own thread, and have a different thread listen for user input. This might introduce difficulties in getting the threads to interact with each other, but it would allow you to do this.
Alternatively, add a method that checks for user input and handles it inside the infinite loop of your program. something like below
while(true){
//do stuff
checkForUserInput();
//do other stuff
}
To solve this problem, you need to run your UI in another thread.
Many programs are based on an infinite loop (servers that keep waiting for a new user to connect for example) and your problem isn't there.
Managing the CPU time (or the core) allocated to your infinite loop and the one allocated to take care of your UI interactions is the job of the operating system, not yours : that's why your UI should run in a separate thread than your actual code.
Depending on the GUI library (Swing, ...) you're using there may be different ways to do it and the way to implement it is well answered on Stack Overflow

Grouping animations for sequential execution

I have a Swing program that executes 2D animations using Swing Timers. With each button click there are several timers created to animate several different components - some of them moving off the screen and others moving on. (I do not know ahead of time what animations will need to be executed with each button click, but it isnt a problem to distinguish between the two "types" of animations at runtime - they're initiated from different methods, and thus its easy to imagine adding them to two different "queues" - a queue of outgoing items and a queue of incoming items. Having done so, I could then implement the basic strategy of calling a
That said - that all only makes sense to me intuitively, heuristically - I haven't figured out how to implement it in practice. What would those "queues" actually be, and what class would hold and later execute them?? Presumably one that implements Runnable, creating a second thread that can execute the animations with tighter control on how they proceed? Or does the event-dispatch thread give me the ample control here: Is there a way to use SwingUtilities.invokeAndWait() (or something like it) to collect all the animations to be performed, while assigning priority to those of a certain class, or that are marked in a certain way?
I would suggest taking a look at the design of some of the existing animation engines like:
The Timing Framework
Trident
The Universal Tween Engine and AurelienRibon / sliding-layout which uses the Tween Engine.
Generally what these engines tend to do is have a central "clock" which ticks at a regular interval. They then provide callback functionality to notify interested parties that a "tick" has occured.
They then offer a series of layers on top of this concept to make it easier to interact with, such as providing a time range for animations, presented as a percentage over time (rather than a physical time measurement), which can be used to calculate fractions of change.
The also provide interpolation, allowing you to affect the speed of the animation through the time cycle (such as slow in, fast out effects).
This approach reduces the overhead of having to have multiple Timers running, which may reduce the performance over time while, providing a separation model, so each "animation" is it's own entity.
Personally, I'd evaluate each one and see which best meets your needs and run with, but if you really want to do it yourself, they provide a good starting point for ideas and designs

Java concurrency - Should block or yield?

I have multiple threads each one with its own private concurrent queue and all they do is run an infinite loop retrieving messages from it. It could happen that one of the queues doesn't receive messages for a period of time (maybe a couple seconds), and also they could come in big bursts and fast processing is necessary.
I would like to know what would be the most appropriate to do in the first case: use a blocking queue and block the thread until I have more input or do a Thread.yield()?
I want to have as much CPU resources available as possible at a given time, as the number of concurrent threads may increase with time, but also I don't want the message processing to fall behind, as there is no guarantee of when the thread will be reescheduled for execution when doing a yield(). I know that hardware, operating system and other factors play an important role here, but setting that aside and looking at it from a Java (JVM?) point of view, what would be the most optimal?
Always just block on the queues. Java yields in the queues internally.
In other words: You cannot get any performance benefit in the other threads if you yield in one of them rather than just block.
You certainly want to use a blocking queue - they are designed for exactly this purpose (you want your threads to not use CPU time when there is no work to do).
Thread.yield() is an extremely temperamental beast - the scheduler plays a large role in exactly what it does; and one simple but valid implementation is to simply do nothing.
Alternatively, consider converting your implementation to use one of the managed ExecutorService implementations - probably ThreadPoolExecutor.
This may not be appropriate for your use case, but if it is, it removes the whole burden of worrying about thread management from your own code - and these questions about yielding or not simply vanish.
In addition, if better thread management algorithms emerge in future - for example, something akin to Apple's Grand Central Dispatch - you may be able to convert your application to use it with almost no effort.
Another thing that you could do is use the concurrent hash map for your queue. When you do a read it gives you a reference of the object you were looking for, so it is possible you my miss a message that was just put into the queue. But if all this is doing is listening for a message you will catch it the next iteration. It would be different if the messages could be updated by other threads. But there doesn't really seem to be a reason to block that I can see.

How good is the JVM at parallel processing? When should I create my own Threads and Runnables? Why might threads interfere?

I have a Java program that runs many small simulations. It runs a genetic algorithm, where each fitness function is a simulation using parameters on each chromosome. Each one takes maybe 10 or so seconds if run by itself, and I want to run a pretty big population size (say 100?). I can't start the next round of simulations until the previous one has finished. I have access to a machine with a whack of processors in it and I'm wondering if I need to do anything to make the simulations run in parallel. I've never written anything explicitly for multicore processors before and I understand it's a daunting task.
So this is what I would like to know: To what extent and how well does the JVM parallel-ize? I have read that it creates low level threads, but how smart is it? How efficient is it? Would my program run faster if I made each simulation a thread? I know this is a huge topic, but could you point me towards some introductory literature concerning parallel processing and Java?
Thanks very much!
Update:
Ok, I've implemented an ExecutorService and made my small simulations implement Runnable and have run() methods. Instead of writing this:
Simulator sim = new Simulator(args);
sim.play();
return sim.getResults();
I write this in my constructor:
ExecutorService executor = Executors.newFixedThreadPool(32);
And then each time I want to add a new simulation to the pool, I run this:
RunnableSimulator rsim = new RunnableSimulator(args);
exectuor.exectue(rsim);
return rsim.getResults();
The RunnableSimulator::run() method calls the Simulator::play() method, neither have arguments.
I think I am getting thread interference, because now the simulations error out. By error out I mean that variables hold values that they really shouldn't. No code from within the simulation was changed, and before the simulation ran perfectly over many many different arguments. The sim works like this: each turn it's given a game-piece and loops through all the location on the game board. It checks to see if the location given is valid, and if so, commits the piece, and measures that board's goodness. Now, obviously invalid locations are being passed to the commit method, resulting in index out of bounds errors all over the place.
Each simulation is its own object right? Based on the code above? I can pass the exact same set of arguments to the RunnableSimulator and Simulator classes and the runnable version will throw exceptions. What do you think might cause this and what can I do to prevent it? Can I provide some code samples in a new question to help?
Java Concurrency Tutorial
If you're just spawning a bunch of stuff off to different threads, and it isn't going to be talking back and forth between different threads, it isn't too hard; just write each in a Runnable and pass them off to an ExecutorService.
You should skim the whole tutorial, but for this particular task, start here.
Basically, you do something like this:
ExecutorService executorService = Executors.newFixedThreadPool(n);
where n is the number of things you want running at once (usually the number of CPUs). Each of your tasks should be an object that implements Runnable, and you then execute it on your ExecutorService:
executorService.execute(new SimulationTask(parameters...));
Executors.newFixedThreadPool(n) will start up n threads, and execute will insert the tasks into a queue that feeds to those threads. When a task finishes, the thread it was running on is no longer busy, and the next task in the queue will start running on it. Execute won't block; it will just put the task into the queue and move on to the next one.
The thing to be careful of is that you really AREN'T sharing any mutable state between tasks. Your task classes shouldn't depend on anything mutable that will be shared among them (i.e. static data). There are ways to deal with shared mutable state (locking), but if you can avoid the problem entirely it will be a lot easier.
EDIT: Reading your edits to your question, it looks like you really want something a little different. Instead of implementing Runnable, implement Callable. Your call() method should be pretty much the same as your current run(), except it should return getResults();. Then, submit() it to your ExecutorService. You will get a Future in return, which you can use to test if the simulation is done, and, when it is, get your results.
You can also see the new fork join framework by Doug Lea. One of the best book on the subject is certainly Java Concurrency in Practice. I would strong recommend you to take a look at the fork join model.
Java threads are just too heavyweight. We have implement parallel branches in Ateji PX as very lightweight scheduled objects. As in Erlang, you can create tens of millions of parallel branches before you start noticing an overhead. But it's still Java, so you don't need to switch to a different language.
If you are doing full-out processing all the time in your threads, you won't benefit from having more threads than processors. If your threads occasionally wait on each other or on the system, then Java scales well up to thousands of threads.
I wrote an app that discovered a class B network (65,000) in a few minutes by pinging each node, and each ping had retries with an increasing delay. When I put each ping on a separate thread (this was before NIO, I could probably improve it now), I could run to about 4000 threads in windows before things started getting flaky. Linux the number was nearer 1000 (Never figured out why).
No matter what language or toolkit you use, if your data interacts, you will have to pay some attention to those areas where it does. Java uses a Synchronized keyword to prevent two threads from accessing a section at the same time. If you write your Java in a more functional manner (making all your members final) you can run without synchronization, but it can be--well let's just say solving problems takes a different approach that way.
Java has other tools to manage units of independent work, look in the "Concurrent" package for more information.
Java is pretty good at parallel processing, but there are two caveats:
Java threads are relatively heavyweight (compared with e.g. Erlang), so don't start creating them in the hundreds or thousands. Each thread gets its own stack memory (default: 256KB) and you could run out of memory, among other things.
If you run on a very powerful machine (especially with a lot of CPUs and a large amount of RAM), then the VM's default settings (especially concerning GC) may result in suboptimal performance and you may have to spend some times tuning them via command line options. Unfortunately, this is not a simple task and requires a lot of knowledge.

Categories