Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 months ago.
Improve this question
As a TDD practitioner I want to test everything I code.
During the last couple of years I've been coding many multithreaded code and one part of my tests have been bothering very much.
When I have to assert something that may happen during the run() loop i end up with some kind o assertion like this:
assertEventually(timeout, assertion)
I know that Mockito has a solution for this, but only for the verify call. I know also that JUnit has a timeout property that is useful to avoid hanging (or ever lasting) tests. But what I want is something that allows me to assert something that may become true over time.
So my question is, does anyone knows the best way to achive this?
Here is my solution so far:
private void assertEventually(int timeoutInMilliseconds, Runnable assertion){
long begin = System.currentTimeMillis();
long now = begin;
Throwable lastException = null;
do{
try{
assertion.run();
return;
}catch(RuntimeException e){
lastException = e;
}catch(AssertionError e){
lastException = e;
}
now = System.currentTimeMillis();
}while((now - begin) < timeoutInMilliseconds);
throw new RuntimeException(lastException);
}
Using it ends up like this:
assertEvetuallyTrue(1000, new Runnable() {
public void run() {
assertThat(Thread.activeCount()).isEqualTo(before);
}
});
From the language that you use - asserting that something may be true - it sounds like you are going down a path that will lead to fragile tests. Tests should have binary outcomes - they either pass or fail, and there is no gray area.
Looking at this specific example, given the use of multi-threading I would suggest something like this:
Refactor your design so that all of the actual logic in your code is easily run independent of your multi-threaded environment as synchronously executed code, which should make writing unit tests around this fairly straightforward.
Implement your thread management outside of all this code - this will be much harder to test using unit tests and therefore you should look to keep it cleanly segregated from the rest of your code so that it doesn't make other code harder to test.
Start building a high level system test suite which executes the system (or large components of it) as a whole, driving your tests through the external boundaries of your application. These will give coverage of the thread handling code, along with testing the integration of the various components in your system. Furthermore, you should not have to write specific test logic that deals with threads - they should just run internally in your system when you test it.
One further advantage of splitting the tests this way is that you create separate test suites for unit tests and system tests, and this should help keep your unit test suite fast and lean (so you can run it more easily and more often during development). Any tests involving timeouts (i.e. the system tests in this case) will take longer to execute, and therefore are more suitable for only running occasionally - this approach makes this easier to do.
If you come from Scala, you might be used to the eventually trait.
It's really useful for testing, since you can retry for a certain period of time until the result eventually is returned or not.
There's an Java wrapper made by halfvim here: https://github.com/halfvim/eventually
Testing for timeliness is a kind of performance tests which, unless you have hard real-time requirements, is asserted with statistics of many samples.
A unit tests would only concern itself with a single sample, and they should be deterministic. Instead of this check-and-loop construct, I think you should let your JUnit thread be notified through some mechanism, as soon as the event occurs that you want to assert on. Callbacks, Futures, Latches, Exchangers, etc. can be used to build such a mechanisms.
It is a design challenge to build your multi-threading code such that it can be tested without check-and-loop, or spin-waiting on a state.
But if you really must do this, then use System.nanoTime() instead of System.currentTimeMillis() because it updates more often, though the actual implementation, and the accuracy, of both clocks depend on your JVM version, operating system and hardware.
Related
I have a pipeline which use Flink Global Window with custom Trigger based on Event Time (from timestamp on arriving element) and Evictor which cut unnecessary elements from the window and pass it to the ProcessFunction,
something like:
public SingleOutputStreamOperator<Results> processElements(DataStream<Elements> inputStream) {
return inputStream
.keyBy(Elements::getId)
.window(GlobalWindows.create())
.trigger(new CustomTrigger())
.evictor(new CustomEvictor())
.process(new MyWindowProcessFunction())
.name("Process")
.uid("process-elements")
.returns(Results.class);
}
public void executePipelineFlow(StreamExecutionEnvironment env) throws Exception {
DataStream<Elements> inputStream = getInputStream(env);
DataStream<Results> processedInput = processElements(inputStream);
applySink(processedInput);
}
I know i can test MyWindowProcessFunction with TestHarness which provide Watermark manipulation but i need to test whole flow, Trigger+Evictor+ProcessFunction.
Also i try some kind of timed SourceFunction with use of Thread.sleep() but my pipeline work in event time and this wont work if i had 1000 elements in test stream (because test will take couple of hours).
My question is, how i can unit test my whole processElements method?
I cant find any test examples for my case.
Thanks
You might look at how the end-to-end integration tests for the windowing exercise in the Flink training are implemented as an example. This exercise isn't using GlobalWindows or custom triggering, etc, but you can use this overall approach to test any pipeline.
The one thing that's maybe less than ideal about this approach is how it handles watermarking. The applications being tested are using the default periodic watermarking strategy, wherein watermarks are generated every 200msec. Since the tests don't run that long, the only watermark that's actually generated is the one that comes at the end of every job with bounded inputs. This works, but isn't quite the same as what will happen in production. (Is this why you were thinking of having your test source sleep between events?)
BTW, these tests in the Flink training repo are made slightly more complex than is ordinarily necessary, because these tests are used to provide coverage for the Java and the Scala implementations of both the exercises and solutions.
EDIT: This question might be appropriate for other languages as well - the overall theory behind it seems mostly language agnostic. However, as this will run in a JVM, I'm sure there's differences between JVM overheads/threading and those of other environments.
EDIT 2: To clarify a little better, I guess the main question is which is better for scalability: to have smaller threads that can return quicker to enable processing other chunks of work for other workloads, or try to get a single workload through as quickly as possible? The workloads are sequential and multithreading won't help speed up a single unit of work in this case; it's more in hopes of increasing the throughput of the system overall (thanks to Uri for leading me towards the clarification).
I'm working on a system that's replacing an existing system; the current system has a pretty heavy load, so we already know the replacement needs to be highly scalable. It communicates with several outside processes, such as email, other services, databases, etc., and I'm already planning on making it multithreaded to help with scaling. I've worked on multithreaded apps before, just nothing with this high of a performance/scalability requirement, so I don't have much experience when it comes to getting the absolute most out of concurrency.
The question I have is what's the best way to divide the work up between threads? I'm looking at two different versions, one that creates a single thread for the full workflow, and another that creates a thread for each of the individual steps, continuing on to the next step (in a new/different thread) when the previous step completes - probably with a NodeJS-style callback system, but not terribly concerned about the direct implementation details.
I don't know much about the nitty-gritty details of multithreading - things like context switching, for example - so I don't know if the overhead of multiple threads would swamp the execution time in each of the threads. On one hand, the single thread model seems like it would be fastest for an individual work flow compared to the multiple threads; however, it would also tie up a single thread for the entire workflow, whereas the multiple threads would be shorter lived and would return to the pool quicker (I imagine, at least).
Hopefully the underlying concept is easy enough to understand; here's a contrived pseudo-code example though:
// Single-thread approach
foo();
bar();
baz();
Or:
// Multiple Thread approach
Thread.run(foo);
when foo.isDone()
Thread.run(bar);
when bar.isDone()
Thread.run(baz);
UPDATE: Completely forgot. The reason I'm considering the multithreaded approach is the (possibly mistaken) belief that, since the threads will have smaller execution times, they'll be available for other instances of the overall workload. If each operation takes, say 5 seconds, then the single-thread version locks up a thread for 15 seconds; the multiple thread version would lock up a single thread for 5 seconds, and then it can be used for another process.
Any ideas? If there's anything similar out there in the interwebs, I'd love even a link - I couldn't think of how to search for this (I blame Monday for that, but it would probably be the same tomorrow).
Multithreading is not a silver bullet. It's means to an end.
Before making any changes, you need to ask yourself where your bottlenecks are, and what you're really trying to parallelize. I'm not sure that without more information that we can give good advice here.
If foo, bar, and baz are part of a pipeline, you're not necessarily going to improve the overall latency of a single sequence by using multiple threads.
What you might be able to do is to increase your throughput by letting multiple executions of the pipeline over different input pieces work in parallel, by letting later items to travel through the pipeline while earlier items are blocked on something (e.g., I/O). For instance, if bar() for a particular input is blocked and waiting on a notification, it's possible that you could do computationally heavy operations on another input, or have CPU resources to devote to foo(). A particularly important question is whether any of the external dependencies act as a limited shared resource. e.g., if one thread is accessing system X, is another thread going to be affected?
Threads are also very effective if you want to divide and conquer your problem - splitting your input into smaller parts, running each part through the pipeline, and then waiting on all the pieces to be ready. Is that possible with the kind of workflow you're looking at?
If you need to first do foo, then do bar, and then do baz, you should have one thread do each of these steps in sequence. This is simple and makes obvious sense.
The most common case where you're better off with the assembly line approach is when keeping the code in cache is more important than keeping the data in cache. In this case, having one thread that does foo over and over can keep the code for this step in cache, keep branch prediction information around, and so on. However, you will have data cache misses when you hand the results of foo to the thread that does bar.
This is more complex and should only be attempted if you have good reason to think it will work better.
Use a single thread for the full workflow.
Dividing up the workflow can't improve the completion time for one piece of work: since the parts of the workflow have to be done sequentially anyway, only one thread can work on the piece of work at a time. However, breaking up the stages can delay the completion time for one piece of work, because a processor which could have picked up the last part of one piece of work might instead pick up the first part of another piece of work.
Breaking up the stages into multiple threads is also unlikely to improve the time to completion of all your work, relative to executing all the stages in one thread, since ultimately you still have to execute all the stages for all the pieces of work.
Here's an example. If you have 200 of these pieces of work, each requiring three 5 second stages, and say a thread pool of two threads running on two processors, keeping the entire workflow in a single thread results in your first two results after 15 seconds. It will take 1500 seconds to get all your results, but you only need the working memory for two of the pieces of work at a time. If you break up the stages, then it may take a lot longer than 15 seconds to get your first results, and you potentially may need memory for all 200 pieces of work proceeding in parallel if you still want to get all the results in 1500 seconds.
In most cases, there are no efficiency advantages to breaking up sequential stages into different threads, and there may be substantial disadvantages. Threads are generally only useful when you can use them to do work in parallel, which does not seem to be the case for your work stages.
However, there is a huge disadvantage to breaking up the stages into separate threads. That disadvantage is that you now need to write multithreaded code that manages the stages. It's extremely easy to write bugs in such code, and such bugs can be very difficult to catch prior to production deployment.
The way to avoid such bugs is to keep the threading code as simple as possible given your requirements. In the case of your work stages, the simplest possible threading code is none at all.
Can anybody suggest me how can can I show Statistically difference between Normal
Multithreading and Executors with multithreading in-terms of as e.g CPU time,Total thread
user time,memory usage, & so on
Any suggestions will be helpful.
I am not sure I understand the term "Statistically difference". I believe that you are asking about using of executors and plain thread API and what is the difference among them.
First, executors a based on threads; it is just yet another layer on top of them. No magic. Plain threading API allows you creation and managing of multithreaded applications but requires dealing with gory details of thread synchronization, pooling, transfering data between threads etc.
Executors framework solves some of these problems. You can define thread pool policy, choose queue type according to your needs and just put new tasks to the incoming queue. The thread pool will execute the tasks according to it configuration.
The problem is that what your question is asking something that makes little sense.
Before you can meaningfully talk about the "statistical difference" between things, you have to have some way of quantifying and measuring them. And before that can happen, you have a clear statement of what you are trying to quantify / measure.
What you are asking satisfies none of these criteria.
Assuming that you have a meaningful question ...
At a practical level, the normal way that people try to quantify the effect of something like this (using thread pools versus creating new threads) is to develop a benchmark application with variants corresponding to the two strategies. Then measure the relative performance. But this has many problems.
The most fundamental problem that what you are actually measuring is effect of the two strategies for that benchmark, and that benchmark only. Generalizing from the benchmark to other applications is very difficult. The problem is that there are "hidden parameters" embedded in the design of any benchmark. For instance, the number of processors, the number of threads, the length and complexity of the tasks, and so on. Without having a good intuition as to what the parameters are, it is difficult to design a benchmark to take them into account. And even if you succeed in figuring out what the hidden parameters are and quantifying their effect, you have the problem that you can't figure out what those parameters will be in a real (more complex) application. At the end of the day, you'll end up with a model that can't give you quantitative answers for real problems. (Computing has nothing like Newton's Law of Gravity.)
I've just made a program with Eclipse that takes a really long time to execute. It's taking even longer because it's loading my CPU to 25% only (I'm assuming that is because I'm using a quad-core and the program is only using one core). Is there any way to make the program use all 4 cores to max it out? Java is supposed to be natively multi-threaded, so I don't understand why it would only use 25%.
You still have to create and manage threads manually in your application. Java can't determine that two tasks can run asynchronously and automatically split the work into several threads.
This is a pretty vague question because we don't know much about what your program does. If your program is single-threaded, then no number of cores on your machine is going to make it run any faster. Java does have threading support, but it won't automatically parallelize your code for you. To speed it up, you'll need to identify parts of the computation that can be run in parallel with one another and add code as appropriate to split up and reconstitute the work. Without more info on what your program does, I can't help you out.
Another important detail to note is that Java threads are not the same as system threads. The JVM often has its own thread scheduler that tries to put Java threads onto actual system threads in a way that's fair, but there's no actual guarantee that it will do so.
Yes, Java is multi-threaded, but the multi-threading doesn't happen "by magic".
Have a look at either at the Thread class or at the Executor framework. Essentially you need to split your job into "subtasks" each of which can run on a single processor, then do something like this:
Executor ex = Executors.newFixedThreadPool(4);
while (thereAreMoreSubtasksToDo) {
ex.execute(new Runnable() {
public void run() {
... do subtask ...
}
});
}
Turning a serial routine/algorithm into a parallel one isn't necessarily trivial: you need to know in particular about a range of issues broadly termed "thread-safety". You may be interested in some material I've written about thread-safety in Java, and threading in general if you follow the links: the key thing to bear in mind is that if any data/objects are being shared among the different threads running, then you need to take special precautions. That said, for independent things that you just want to "run at the same time", then the above pattern will get you started.
Java is multi-threaded but if your application runs in only one thread, only one thread will be used. (Apart from the internal threads Java uses for finalization, garbage collection and so on.)
If you want your code to use multiple threads, you have to split it up manually, either by starting threads by yourself or using a third party thread pool. I'd suggest the latter option as it's safer but both can work equally well.
You've got a bit of learning ahead of you (actually, quite a bit of learning) - but it's learning you should do if you are going to be doing any serious programming.
Here's a starting point: http://download.oracle.com/javase/tutorial/essential/concurrency/
But you might want to look into a good book on Java multi-threading (I did this so long ago that any book I could recommend would be out of print). This sort of hard topic is well suited for learning from a text instead of online tutorials.
I'm wondering what good ways there would be make assertions about synchronization or something so that I could detect synchronization violations (while testing).
That would be used for example for the case that I'd have a class that is not thread-safe and that isn't going to be thread-safe. With some way I would have some assertion that would inform me (log or something) if some method(s) of it was called from multiple threads.
I'm longing for something similar that could be made for AWT dispatch thread with the following:
public static void checkDispatchThread() {
if(!SwingUtilities.isEventDispatchThread()) {
throw new RuntimeException("GUI change made outside AWT dispatch thread");
}
}
I'd only want something more general. The problem description isn't so clear but I hope somebody has some good approaches =)
You are looking for the holy grail, I think. AFAIK it doesn't exist, and Java is not a language that allows such an approach to be easily created.
"Java Concurrency in Practice" has a section on testing for threading problems. It draws special attention to how hard it is to do.
When an issue arises over threads in Java it is usually related to deadlock detection, more than just monitoring what Threads are accessing a synchronized section at the same time. JMX extension, added to JRE since 1.5, can help you detect those deadlocks. In fact we use JMX inside our own software to automatically detect deadlocks an trace where it was found.
Here is an example about how to use it.
IntelliJ IDEA has a lot of useful concurrency inspections. For example, it warns you when you are accessing the same object from both synchronised and unsynchronised contexts, when you are synchronising on non-final objects and more.
Likewise, FindBugs has many similar checks.
As well as #Fernando's mention of thread deadlocking, another problem with multiple threads is concurrent modifications and the problems it can cause.
One thing that Java does internally is that a collection class keeps a count of how many times it's been updated. And then an iterator checks that value on every .next() against what it was when the interator was created to see if the collection has been updated while you were iterating. I think that principle could be used more generally.
Try ConTest or Covertity
Both tools analyze the code to figure out which parts of the data might be shared between threads and then they instrument the code (add extra bytecode to the compiled classes) to check if it breaks when two threads try to change some data at the same time. The two threads are then run over and over again, each time starting them with a slightly different time offset to get many possible combinations of access patterns.
Also, check this question: Unit testing a multithreaded application?
You might be interested in an approach Peter Veentjer blogged about, which he calls The Concurrency Detector. I don't believe he has open-sourced this yet, but as he describes it the basic idea is to use AOP to instrument code that you're interested in profiling, and record which thread has touched which field. After that it's a matter of manually or automatically parsing the generated logs.
If you can identify thread unsafe classes, static analysis might be able to tell you whether they ever "escape" to become visible to multiple threads. Normally, programmers do this in their heads, but obviously they are prone to mistakes in this regard. A tool should be able to use a similar approach.
That said, from the use case you describe, it sounds like something as simple as remembering a thread and doing assertions on it might suffice for your needs.
class Foo {
private final Thread owner = Thread.currentThread();
void x() {
assert Thread.currentThread() == owner;
/* Implement method. */
}
}
The owner reference is still populated even when assertions are disabled, so it's not entirely "free". I also wouldn't want to clutter many of my classes with this boilerplate.
The Thread.holdsLock(Object) method may also be useful to you.
For the specific example you give, SwingLabs has some helper code to detect event thread violations and hangs. https://swinghelper.dev.java.net/
A while back, I worked with the JProbe java profiling tools. One of their tools (threadalyzer?) looked for thread sync violations. Looking at their web page, I don't see a tool by that name or quite what I remember. But you might want to take a look. http://www.quest.com/jprobe/performance-home.aspx
You can use Netbeans profiler or JConsole to check the threads status in depth