How to test Flink Global Window with Trigger And Evictor

How to test Flink Global Window with Trigger And Evictor - java

I have a pipeline which use Flink Global Window with custom Trigger based on Event Time (from timestamp on arriving element) and Evictor which cut unnecessary elements from the window and pass it to the ProcessFunction,
something like:
public SingleOutputStreamOperator<Results> processElements(DataStream<Elements> inputStream) {
return inputStream
.keyBy(Elements::getId)
.window(GlobalWindows.create())
.trigger(new CustomTrigger())
.evictor(new CustomEvictor())
.process(new MyWindowProcessFunction())
.name("Process")
.uid("process-elements")
.returns(Results.class);
}
public void executePipelineFlow(StreamExecutionEnvironment env) throws Exception {
DataStream<Elements> inputStream = getInputStream(env);
DataStream<Results> processedInput = processElements(inputStream);
applySink(processedInput);
}
I know i can test MyWindowProcessFunction with TestHarness which provide Watermark manipulation but i need to test whole flow, Trigger+Evictor+ProcessFunction.
Also i try some kind of timed SourceFunction with use of Thread.sleep() but my pipeline work in event time and this wont work if i had 1000 elements in test stream (because test will take couple of hours).
My question is, how i can unit test my whole processElements method?
I cant find any test examples for my case.
Thanks

You might look at how the end-to-end integration tests for the windowing exercise in the Flink training are implemented as an example. This exercise isn't using GlobalWindows or custom triggering, etc, but you can use this overall approach to test any pipeline.
The one thing that's maybe less than ideal about this approach is how it handles watermarking. The applications being tested are using the default periodic watermarking strategy, wherein watermarks are generated every 200msec. Since the tests don't run that long, the only watermark that's actually generated is the one that comes at the end of every job with bounded inputs. This works, but isn't quite the same as what will happen in production. (Is this why you were thinking of having your test source sleep between events?)
BTW, these tests in the Flink training repo are made slightly more complex than is ordinarily necessary, because these tests are used to provide coverage for the Java and the Scala implementations of both the exercises and solutions.

Related

How to enable a global timeout for JUnit testcase runs?

This question suggests to use the timeout parameter of the #Test annotation to have JUnit forcefully stop tests after that timeout period.
But we have like 5000 unit tests so far, and we want to establish a policy that asks developers to never release tests that need more than 10 seconds to complete. The policy would probably say "aim for < 10 seconds", but then we would like to ensure that any test is stopped after say 30 seconds. (the numbers are just examples, the idea is to define something that is "good enough" for most use cases, but that also makes sure things dont run "forever" )
Now I am wondering if there is a way to enable such behavior without turning into each test case and adding that annotation parameter.
The existing question doesn't help either: I am looking for one change to enable this, not a one change per test class solution. One central, global switch. Not one per file or method.

Although JUnit Jupiter (i.e., the programming and extension model introduced in JUnit 5) does not yet have built-in support for global timeouts, you can still implement global timeout support on your own.
The only catch is that a timeout extension cannot currently abort test execution preemptively. In other words, a timeout extension in JUnit Jupiter can currently only time the execution of tests and then throw an exception if the execution took too long (i.e., after waiting for the test to end, which may potentially never happen if the test hangs).
In any case, if you want to implement a non-preemptive global timeout extension for use with JUnit Jupiter, here's what you need to do.
Look at the TimingExtension example in the JUnit 5 User Guide for inspiration. You'll need code similar to that, but you'll want to throw an exception if the duration exceeds a configured timeout. How you configure your global timeout is up to you: hard code it, look up the value from a JVM system property, look up the value from a custom annotation etc.
Register your global timeout extension using Java's ServiceLoader mechanism. See Automatic Extension Registration for details.
Happy Testing!

Check out my JUnit 4 extension library (https://github.com/Nordstrom/JUnit-Foundation). Among the features provided by this library is the ability to define a global timeout value, which will be automatically applied to each test method that doesn't already define a longer timeout interval.
This library uses the Byte Buddy byte code generation library to install event hooks at strategic points in the test execution flow of JUnit 4. The global timeout is applied when JUnit has created a test class instance to run an "atomic" test.
To apply the global timeout, the library replaces the original #Test annotation with an object that implements the #Test interface. This approach utilizes all of JUnit's native timeout functionality, which provides pre-emptive termination of tests that run too long. The use of native timeout functionality eliminates the need for invasive implementation or special-case handling, and this functionality is activated without touching a single source file.
All of the updates needed to install and activate global timeout support are in the project file (POM / build.gradle) and optional properties file. The timeout interval can be overridden via System property, which enables adjustments to be made from the command line or programmatically. For scenarios where timeout failures are caused by transient conditions, you may want to pair the global timeout feature with the automatic retry feature.

What you're probably looking for is not implemented: https://github.com/junit-team/junit4/issues/140
Although, you can achieve the same results with simple inheritance.
Define an abstract parent class, like BaseIntegrationTest with the following #Rule field:
public abstract class BaseIntegrationTest extends RunListener {
private static final int TEST_GLOBAL_TIMEOUT_VALUE = 10;
#Rule
protected Timeout globalTimeout = Timeout.seconds(TEST_GLOBAL_TIMEOUT_VALUE);
}
Then make it a parent for every test class within the scope. For example:
public class BaseEntityTest extends BaseIntegrationTest {
#Before
public void init() {
// init
}
#Test
public void twoPlusTwoTest() throws Exception {
assert 2 + 2 == 4;
}
}
That's it.

Currently, maybe you cannot because Junit 5 removed the Rule and replace with the Extension.
The above example does not work because the example code implements the AfterTestExecutionCallback that will be invoked after the test method is done, so the timeout is not useful.

How to write a unit test for a method that does a retry with back off? (Using FailSafe in Java)

This is the method that I would like to test:
void someMethodThatRetries(){
Failsafe.with( retryPolicy ).get(() -> callServiceX());
}
Retry policy looks like this :
this.retryPolicy = new RetryPolicy()
.retryIf( responseFromServiceXIsInvalid() )
.withBackoff( delay, MAX_DELAY, TimeUnit.MILLISECONDS )
This method calls a service X and retries the call to service X on a certain condition(response from X does not have certain values). Each retry is done with a delay and backoff.
Test Looks like this :
#Test
public void retriesAtMostThreeTimesIfResponseIsInvalid() throws Exception {
// Code that verifies that ServiceX got called 3 times. Service is called using a stub, and I am verifying on that stub
}
I am writing a test that verifies that service X gets called 3 times(Maximum allowed retries are 3) when the condition is met.
Because of the Delay and Back-off the unit test takes too much time. How should we write test in this case?
One solution that I thought of is to do a separate test on the RetryPolicy that it should retry 3 times, and a separate test for the fact that it retries when condition is met.
How should I do it?

I'd say that you should aim at unit-testing the functions callServiceX and responseFromServiceXIsInvalid, but apart from that you are in the realm of integration-testing and subsystem-testing (aka component-testing). Everything with an algorithmic nature here is hidden behind the FailSafe and RetryPolicy classes and methods - your code is just calling them.
Therefore, many of the bugs that your code might contain lie in the interaction with / proper use of these external classes. For example, you might have messed up the order of arguments delay and MAX_DELAY - you would find this only with integration testing.
There are also potential bugs on unit-testing level, like, the value of delay might not match the specified time unit. The hassle of checking this with unit-testing in these circumstances would be too big in my eyes. Check this in a re-view, or, again, use subsystem-testing to see if the durations are as you expect.
Some additional warning: When doing integration-testing and subsystem-testing, be sure to keep the focus on the bugs you want to find. This will help you to avoid that in the end you are effectively testing the FailSafe and RetryPolicy classes - which hopefully have been tested already by the library developers.

Whats the best whay to Assert Multi Thread Code in Java [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 months ago.
Improve this question
As a TDD practitioner I want to test everything I code.
During the last couple of years I've been coding many multithreaded code and one part of my tests have been bothering very much.
When I have to assert something that may happen during the run() loop i end up with some kind o assertion like this:
assertEventually(timeout, assertion)
I know that Mockito has a solution for this, but only for the verify call. I know also that JUnit has a timeout property that is useful to avoid hanging (or ever lasting) tests. But what I want is something that allows me to assert something that may become true over time.
So my question is, does anyone knows the best way to achive this?
Here is my solution so far:
private void assertEventually(int timeoutInMilliseconds, Runnable assertion){
long begin = System.currentTimeMillis();
long now = begin;
Throwable lastException = null;
do{
try{
assertion.run();
return;
}catch(RuntimeException e){
lastException = e;
}catch(AssertionError e){
lastException = e;
}
now = System.currentTimeMillis();
}while((now - begin) < timeoutInMilliseconds);
throw new RuntimeException(lastException);
}
Using it ends up like this:
assertEvetuallyTrue(1000, new Runnable() {
public void run() {
assertThat(Thread.activeCount()).isEqualTo(before);
}
});

From the language that you use - asserting that something may be true - it sounds like you are going down a path that will lead to fragile tests. Tests should have binary outcomes - they either pass or fail, and there is no gray area.
Looking at this specific example, given the use of multi-threading I would suggest something like this:
Refactor your design so that all of the actual logic in your code is easily run independent of your multi-threaded environment as synchronously executed code, which should make writing unit tests around this fairly straightforward.
Implement your thread management outside of all this code - this will be much harder to test using unit tests and therefore you should look to keep it cleanly segregated from the rest of your code so that it doesn't make other code harder to test.
Start building a high level system test suite which executes the system (or large components of it) as a whole, driving your tests through the external boundaries of your application. These will give coverage of the thread handling code, along with testing the integration of the various components in your system. Furthermore, you should not have to write specific test logic that deals with threads - they should just run internally in your system when you test it.
One further advantage of splitting the tests this way is that you create separate test suites for unit tests and system tests, and this should help keep your unit test suite fast and lean (so you can run it more easily and more often during development). Any tests involving timeouts (i.e. the system tests in this case) will take longer to execute, and therefore are more suitable for only running occasionally - this approach makes this easier to do.

If you come from Scala, you might be used to the eventually trait.
It's really useful for testing, since you can retry for a certain period of time until the result eventually is returned or not.
There's an Java wrapper made by halfvim here: https://github.com/halfvim/eventually

Testing for timeliness is a kind of performance tests which, unless you have hard real-time requirements, is asserted with statistics of many samples.
A unit tests would only concern itself with a single sample, and they should be deterministic. Instead of this check-and-loop construct, I think you should let your JUnit thread be notified through some mechanism, as soon as the event occurs that you want to assert on. Callbacks, Futures, Latches, Exchangers, etc. can be used to build such a mechanisms.
It is a design challenge to build your multi-threading code such that it can be tested without check-and-loop, or spin-waiting on a state.
But if you really must do this, then use System.nanoTime() instead of System.currentTimeMillis() because it updates more often, though the actual implementation, and the accuracy, of both clocks depend on your JVM version, operating system and hardware.

Unit testing for void methods and threads in jUnit

I'm new to unit testing. I understood the principles of it, but I still can't figure out how to test my current project. I need to test void methods, operating with java.nio.SocketChannel. These methods are:
- initSelector, where I open selector, bind new ServerSocketChannel and register it
- read, which reads data and puts it to a queue (should i write extra method for verifying, if that data actually exists in queue? and in that case, should i write tests for that methods?)
- write method, which takes data from a queue and writes it to a SocketChannel
I can test this methods for not throwing IOException, but what else?
And how should I test run() method of a Thread? Or is it not unit testing, but system or other?

Basically, you have two possibilities:
if you want to thoroughly unit test these methods, you should hide the concrete (hardware dependent components like sockets etc. behind mockable interfaces, and use mocks in the unit tests to verify that the expected calls with the expected parameters are made to these objects
or you can write integration / system tests using the real sockets within the whole component / app, to verify that the correct sockets are opened, data is transferred properly etc.
Ideally, you should do both, but in the real world, unit testing may not always be feasible. Especially for such low-level methods, which depend on some external software/hardware component like sockets, DBs, file system etc. Then the best approach is to leave as little logic (thus as little possibilities for failure) in these methods / layers as possible, and abstract out the logic into a higher layer, designed to be unit testable (using mockable interfaces as mentioned above).
To test threads, you can just run() them from your unit test as usual. Then most likely you need to wait for some time before trying to get and verify the results produced by the thread.
Again, if you abstract away the actual logic of the task into e.g. a Callable or Runnable, you can unit test it in isolation much easier. And this also enables you to use the Executor framework (now or later), which makes dealing with concurrency much easier and safer.

So first, if you are using a real SocketChannel in your unit test, it is not a unit test. You should use a mock (consider Mockito) for the SocketChannel. Doing so will allow you to provide a controlled stream of bytes to the method under test and verify what bytes are passed to the channel.
If your class is creating the instance of the SocketChannel, consider changing the class to accept a SocketChannelFactory. Then you can inject a SocketChannelFactory mock which returns a SocketChannel mock.
You can just call run() directly in your unit test.
Mockito link

run() is a method like any other, so you should just be able to call it from a unit test (depending on if it's running in an endless loop of course - then you might want to test the methods that run() is calling).
For the SocketChannel I'd say you don't want to test the SocketChannel itself; you want to test how your code interacts with the SocketChannel given a certain set of start conditions. So you could look into creating a mock for it, and having your code talk to the mock. That way you can verify if your code is interacting with the channel in the way you expect (read(), write() and so on).
Check out http://code.google.com/p/powermock/ for example.

Java: TaskExecutor for Asynchronous Database Writes?

I'm thinking of using Java's TaskExecutor to fire off asynchronous database writes. Understandably threads don't come for free, but assuming I'm using a fixed threadpool size of say 5-10, how is this a bad idea?
Our application reads from a very large file using a buffer and flushes this information to a database after performing some data manipulation. Using asynchronous writes seems ideal here so that we can continue working on the file. What am I missing? Why doesn't every application use asynchronous writes?

Why doesn't every application use asynchronous writes?
It's often necessary/usefull/easier to deal with a write failure in a synchronous manner.

I'm not sure a threadpool is even necessary. I would consider using a dedicated databaseWriter thread which does all writing and error handling for you. Something like:
public class AsyncDatabaseWriter implements Runnable {
private LinkedBlockingQueue<Data> queue = ....
private volatile boolean terminate = false;
public void run() {
while(!terminate) {
Data data = queue.take();
// write to database
}
}
public void ScheduleWrite(Data data) {
queue.add(data);
}
}
I personally fancy the style of using a Proxy for threading out operations which might take a long time. I'm not saying this approach is better than using executors in any way, just adding it as an alternative.

Idea is not bad at all. Actually I just tried it yesterday because I needed to create a copy of online database which has 5 different categories with like 60000 items each.
By moving parse/save operation of each category into the parallel tasks and partitioning each category import into smaller batches run in parallel I reduced the total import time from several hours (estimated) to 26 minutes. Along the way I found good piece of code for splitting the collection: http://www.vogella.de/articles/JavaAlgorithmsPartitionCollection/article.html
I used ThreadPoolTaskExecutor to run tasks. Your tasks are just simple implementation of Callable interface.

why doesn't every application use asynchronous writes? - erm because every application does a different thing.
can you believe some applications don't even use a database OMG!!!!!!!!!
seriously though, given as you don't say what your failure strategies are - sounds like it could be reasonable. What happens if the write fails? or the db does away somehow
some databases - like sybase - have (or at least had) a thing where they really don't like multiple writers to a single table - all the writers ended up blocking each other - so maybe it wont actually make much difference...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.