I can't use my propertie for asign value to delay:
#Value("${delayReintentos}")
private long delay;
#Retryable(value = { SQLException.class }, maxAttempts = 3, backoff = #Backoff(delay = delay))
public String simpleRetry() throws SQLException {
counter++;
LOGGER.info("Billing Service Failed "+ counter);
throw new SQLException();
}
Java11, Spring boot
Use delayExpression instead
/**
* An expression evaluating to the canonical backoff period. Used as an initial value
* in the exponential case, and as a minimum value in the uniform case. Overrides
* {#link #delay()}.
* #return the initial or canonical backoff period in milliseconds.
* #since 1.2
*/
String delayExpression() default "";
See the readme: https://github.com/spring-projects/spring-retry#readme
You can use SpEL or property placeholders
#Backoff(delayExpression = "${my.delay}",
maxDelayExpression = "#integerFiveBean", multiplierExpression = "${onePointOne}")
Try this by wrapping the property name in braces.
#Retryable(value = SQLException.class, maxAttempts = 3, backoff = #Backoff(delayExpression = "${delay}")
Related
I'd like to join data coming in from two Kafka topics ("left" and "right").
Matching records are to be joined using an ID, but if a "left" or a "right" record is missing, the other one should be passed downstream after a certain timeout. Therefore I have chosen to use the coGroup function.
This works, but there is one problem: If there is no message at all, there is always at least one record which stays in an internal buffer for good. It gets pushed out when new messages arrive. Otherwise it is stuck.
The expected behaviour is that all records should be pushed out after the configured idle timeout has been reached.
Some information which might be relevant
Flink 1.14.4
The Flink parallelism is set to 8, so is the number of partitions in both Kafka topics.
Flink checkpointing is enabled
Event-time processing is to be used
Lombok is used: So val is like final var
Some code snippets:
Relevant join settings
public static final int AUTO_WATERMARK_INTERVAL_MS = 500;
public static final Duration SOURCE_MAX_OUT_OF_ORDERNESS = Duration.ofMillis(4000);
public static final Duration SOURCE_IDLE_TIMEOUT = Duration.ofMillis(1000);
public static final Duration TRANSFORMATION_MAX_OUT_OF_ORDERNESS = Duration.ofMillis(5000);
public static final Duration TRANSFORMATION_IDLE_TIMEOUT = Duration.ofMillis(1000);
public static final Time JOIN_WINDOW_SIZE = Time.milliseconds(1500);
Create KafkaSource
private static KafkaSource<JoinRecord> createKafkaSource(Config config, String topic) {
val properties = KafkaConfigUtils.createConsumerConfig(config);
val deserializationSchema = new KafkaRecordDeserializationSchema<JoinRecord>() {
#Override
public void deserialize(ConsumerRecord<byte[], byte[]> record, Collector<JoinRecord> out) {
val m = JsonUtils.deserialize(record.value(), JoinRecord.class);
val copy = m.toBuilder()
.partition(record.partition())
.build();
out.collect(copy);
}
#Override
public TypeInformation<JoinRecord> getProducedType() {
return TypeInformation.of(JoinRecord.class);
}
};
return KafkaSource.<JoinRecord>builder()
.setProperties(properties)
.setBootstrapServers(config.kafkaBootstrapServers)
.setTopics(topic)
.setGroupId(config.kafkaInputGroupIdPrefix + "-" + String.join("_", topic))
.setDeserializer(deserializationSchema)
.setStartingOffsets(OffsetsInitializer.latest())
.build();
}
Create DataStreamSource
Then the DataStreamSource is built on top of the KafkaSource:
Configure "max out of orderness"
Configure "idleness"
Extract timestamp from record, to be used for event time processing
private static DataStreamSource<JoinRecord> createLeftSource(Config config,
StreamExecutionEnvironment env) {
val leftKafkaSource = createLeftKafkaSource(config);
val leftWms = WatermarkStrategy
.<JoinRecord>forBoundedOutOfOrderness(SOURCE_MAX_OUT_OF_ORDERNESS)
.withIdleness(SOURCE_IDLE_TIMEOUT)
.withTimestampAssigner((joinRecord, __) -> joinRecord.timestamp.toEpochSecond() * 1000L);
return env.fromSource(leftKafkaSource, leftWms, "left-kafka-source");
}
Use keyBy
The keyed sources are created on top of the DataSource instances like this:
Again configure "out of orderness" and "idleness"
Again extract timestamp
val leftWms = WatermarkStrategy
.<JoinRecord>forBoundedOutOfOrderness(TRANSFORMATION_MAX_OUT_OF_ORDERNESS)
.withIdleness(TRANSFORMATION_IDLE_TIMEOUT)
.withTimestampAssigner((joinRecord, __) -> {
if (VERBOSE_JOIN)
log.info("Left : " + joinRecord);
return joinRecord.timestamp.toEpochSecond() * 1000L;
});
val leftKeyedSource = leftSource
.keyBy(jr -> jr.id)
.assignTimestampsAndWatermarks(leftWms)
.name("left-keyed-source");
Join using coGroup
The join then combines the left and the right keyed sources
val joinedStream = leftKeyedSource
.coGroup(rightKeyedSource)
.where(left -> left.id)
.equalTo(right -> right.id)
.window(TumblingEventTimeWindows.of(JOIN_WINDOW_SIZE))
.apply(new CoGroupFunction<JoinRecord, JoinRecord, JoinRecord>() {
#Override
public void coGroup(Iterable<JoinRecord> leftRecords,
Iterable<JoinRecord> rightRecords,
Collector<JoinRecord> out) {
// Transform
val result = ...;
out.collect(result);
}
Write stream to console
The resulting joinedStream is written to the console:
val consoleSink = new PrintSinkFunction<JoinRecord>();
joinedStream.addSink(consoleSink);
How can I configure this join operation, so that all records are pushed downstream after the configured idle timeout?
If it can't be done this way: Is there another option?
This is the expected behavior. withIdleness doesn't try to handle the case where all streams are idle. It only helps in cases where there are still events flowing from at least one source partition/shard/split.
To get the behavior you desire (in the context of a continuous streaming job), you'll have to implement a custom watermark strategy that advances the watermark based on a processing time timer. Here's an implementation that uses the legacy watermark API.
On the other hand, if the job is complete and you just want to drain the final results before shutting it down, you can use the --drain option when you stop the job. Or if you use bounded sources this will happen automatically.
I am testing a window function which has a listState, with TTL enabled.
Snippet of window function:
public class CustomWindowFunction extends ProcessWindowFunction<InputPOJO, OutputPOJO, String, TimeWindow> {
...
#Override
public void open(Configuration config) {
StateTtlConfig ttlConfig =
StateTtlConfig.newBuilder(listStateTTl)
.setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
.setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired) // NOTE: NeverReturnExpired
.build();
listStateDescriptor = new ListStateDescriptor<>("unprocessedItems", InputPOJO.class);
listStateDescriptor.enableTimeToLive(ttlConfig);
}
#Override
public void process( String key, Context context, Iterable<InputPOJO> windowElements, Collector<OutputPOJO> out) throws Exception {
ListState<InputPOJO> listState = getRuntimeContext().getListState(listStateDescriptor);
....
Iterator<InputPOJO> iterator;
// Getting unexpired listStateItems for computation.
iterator = listState.get().iterator();
while (iterator.hasNext()) {
InputPOJO listStateInput = iterator.next();
System.out.println("There are unexpired elements in listState");
/** Business Logic to compute result using the unexpired values in listState**/
}
/** Business Logic to compute result using the current window elements.*/
// Adding unProcessed WindowElements to ListState(with TTL)
// NOTE: processed WindowElements are removed manually.
iterator = windowElements.iterator();
while (iterator.hasNext()) {
System.out.println("unProcessed Item added to ListState.")
InputPOJO unprocessedItem = iterator.next();
listState.add(unprocessedItem); // This part gets executed for listStateInput1
}
}
....
}
I am using testHarness to perform the integration test. I am testing the listState item count when the TTL for the listState is expired. Below is my test function snippet.
NOTE:
There is a custom allowedLateness which is implemented using a custom Timer.
private OneInputStreamOperatorTestHarness<InputPOJO, OutputPOJO> testHarness;
private CustomWindowFunction customWindowFunction;
#Before
public void setup_testHarness() throws Exception {
KeySelector<InputPOJO, String> keySelector = InputPOJO::getKey;
TypeInformation<InputPOJO> STRING_INT_TUPLE = TypeInformation.of(new TypeHint<InputPOJO>() {}); // Any suggestion ?
ListStateDescriptor<InputPOJO> stateDesc = new ListStateDescriptor<>("window-contents", STRING_INT_TUPLE.createSerializer(new ExecutionConfig())); // Any suggestion ?
/**
* Creating windowOperator for the below function
*
* <pre>
*
* DataStream<OutputPOJO> OutputPOJOStream =
* inputPOJOStream
* .keyBy(InputPOJO::getKey)
* .window(ProcessingTimeSessionWindows.withGap(Time.seconds(triggerMaximumTimeoutSeconds)))
* .trigger(new CustomTrigger(triggerAllowedLatenessMillis))
* .process(new CustomWindowFunction(windowListStateTtlMillis));
* </pre>
*/
customWindowFunction = new CustomWindowFunction(secondsToMillis(windowListStateTtlMillis));
WindowOperator<String, InputPOJO, Iterable<InputPOJO>, OutputPOJO, TimeWindow>
operator =
new WindowOperator<>(
// setting .window(ProcessingTimeSessionWindows.withGap(maxTimeout))
ProcessingTimeSessionWindows.withGap(Time.seconds(triggerMaximumTimeoutSeconds)),
new TimeWindow.Serializer(),
// setting .keyBy(InputPOJO::getKey)
keySelector,
BasicTypeInfo.STRING_TYPE_INFO.createSerializer(new ExecutionConfig()),
stateDesc,
// setting .process(new CustomWindowFunction(windowListStateTtlMillis))
new InternalIterableProcessWindowFunction<>(customWindowFunction),
// setting .trigger(new CustomTrigger(allowedLateness))
new CustomTrigger(secondsToMillis(allowedLatenessSeconds)),
0,
null);
// Creating testHarness for window operator
testHarness = new KeyedOneInputStreamOperatorTestHarness<>(operator, keySelector, BasicTypeInfo.STRING_TYPE_INFO);
// Setup and Open Test Harness
testHarness.setup();
testHarness.open();
}
#Test
public void test_listStateTtl_exclusion() throws Exception {
int allowedLatenessSeconds = 3;
int listStateTTL = 10;
//1. Arrange
InputPOJO listStateInput1 = new InputPOJO(1,"Arjun");
InputPOJO listStateInput2 = new InputPOJO(2,"Arun");
// 2. Act
// listStateInput1 comes at 1 sec
testHarness.setProcessingTime(secondsToMillis(1));
testHarness.processElement(new StreamRecord<>(listStateInput1));
// Setting current processing time to 1 + 3 = 4 > allowedLateness.
// Window.process() is called, and window is purged (FIRE_AND_PURGE)
// Expectation: listStateInput1 is put into listState with TTL (10 secs), before process() ends.
testHarness.setProcessingTime(secondsToMillis(4));
// Setting processing time after listStateTTL, ie 4 + listStateTTL(10) + 1 = 15
// Expectation: listStateInput1 is evicted from the listState (Fails)
testHarness.setProcessingTime(secondsToMillis(15));
// Using sleep(), the listStateTTL is getting applied to listState and listStateInput1 is evicted (Pass)
//Thread.sleep(secondsToMillis(15))
//Passing listStateInput2 to the test Harness
testHarness.setProcessingTime(secondsToMillis(16));
testHarness.processElement(new StreamRecord<>(listStateInput2));
// Setting processing time after allowedLateness = 16 + 3 + 1 = 20
testHarness.setProcessingTime(secondsToMillis(20));
// 3. Assert
List<StreamRecord<? extends T>> streamRecords = testHarness.extractOutputStreamRecords();
// Expectation: streamRecords will only contain listStateInput2, since listStateInput1 was evicted.
// Actual: Getting both listStateInput1 & listStateInput2 in the output.
}
I noticed that TTL is not getting applied by setting processing time. When I tried the same function with Thread.sleep(TTL), the result was as expected.
Is listState TTL using system time for eviction (with testHarness)?
Is there any way to test listStateTTL using testHarness?
TTL test should by the following way
#Test
public void testSetTtlTimeProvider() throws Exception {
AbstractStreamOperator<Integer> operator = new AbstractStreamOperator<Integer>() {};
try (AbstractStreamOperatorTestHarness<Integer> result =
new AbstractStreamOperatorTestHarness<>(operator, 1, 1, 0)) {
result.config.setStateKeySerializer(IntSerializer.INSTANCE);
result.config.serializeAllConfigs();
Time timeToLive = Time.hours(1);
result.initializeState(OperatorSubtaskState.builder().build());
result.open();
ValueStateDescriptor<Integer> stateDescriptor =
new ValueStateDescriptor<>("test", IntSerializer.INSTANCE);
stateDescriptor.enableTimeToLive(StateTtlConfig.newBuilder(timeToLive).build());
KeyedStateBackend<Integer> keyedStateBackend = operator.getKeyedStateBackend();
ValueState<Integer> state =
keyedStateBackend.getPartitionedState(
VoidNamespace.INSTANCE,
VoidNamespaceSerializer.INSTANCE,
stateDescriptor);
int expectedValue = 42;
keyedStateBackend.setCurrentKey(1);
result.setStateTtlProcessingTime(0L);
state.update(expectedValue);
Assert.assertEquals(expectedValue, (int) state.value());
result.setStateTtlProcessingTime(timeToLive.toMilliseconds() + 1);
Assert.assertNull(state.value());
}
}
I have a scheduled task which takes the hostname and other Calendar information from a JSP page. For the "HOUR_of_Day" input, I always add 12 and pass the total to Calender since it uses a 24 - hour format. What I did worked fine yesterday evening but it is not working this morning. It does not run anymore at the scheduled time. I certainly have done something wrong and need some direction. Below is my code:
import java.util.Calendar;
import java.util.Timer;
import java.util.TimerTask;
public class ScheduledTask extends TimerTask {
String hostname = null;
//The default no - arg constructor
//=============================
public ScheduledTask(){
}
/**
* Another overloaded constructor
* that accepts an object
* #param scanHostObj
public ScheduledTask(ScanHost scanHostObj){
hostname = scanHostObj.getHostname();
}
/**
* Another overloaded constructor
* that accepts a string parameter
* #param nodename
*/
public ScheduledTask(String nodename){
hostname = nodename;
}
/**
* The run method that executes
* the scheduled task
*/
public void run() {
new ScanUtility().performHostScan(hostname);
}
public void executeScheduledScanJob(ScanHost hostObj, Scheduler schedulerObj){
/**
* Get the various schedule
* data, convert to appropriate data type
* and feed to calender class
*/
String nodename = hostObj.getHostname();
String dayOfScan = schedulerObj.getDayOfScan();
String hourOfScan = schedulerObj.getHourOfScan();
String minuteOfScan = schedulerObj.getMinuteofScan();
//Convert String values to integers
//===================================
final int THE_DAY = Integer.parseInt(dayOfScan);
final int THE_HOUR = Integer.parseInt(hourOfScan);
final int REAL_HOUR = THE_HOUR + 12;
final int THE_MINUTE = Integer.parseInt(minuteOfScan);
/**
* Feed these time values to the Calendar class.
* Since Calendar takes a 24 - hour format for hours
* it is better to add 12 to any integer value for
* hours to bring the real hourly format to the 24
* format required by the Calendar class
*/
Calendar scheduleCalendar = Calendar.getInstance();
scheduleCalendar.set(Calendar.DAY_OF_WEEK, THE_DAY);
scheduleCalendar.set(Calendar.HOUR_OF_DAY, REAL_HOUR);
scheduleCalendar.set(Calendar.MINUTE, THE_MINUTE);
/**
* Then Initialize the timer Object
* and pass in parameters to it method
* cancel out all tasks/jobs after running them
*/
Timer scheduledTimeObj = new Timer();
ScheduledTask scheduledTaskObj = new ScheduledTask(nodename);
scheduledTimeObj.schedule(scheduledTaskObj, scheduleCalendar.getTime());
scheduledTimeObj.cancel();
scheduledTaskObj.cancel();
}
}
You could try to replace
scheduleCalendar.set(Calendar.HOUR_OF_DAY, REAL_HOUR);
With this:
scheduleCalendar.set(Calendar.HOUR, THE_HOUR);
This one will take 12 hour format.
Additionally, you can set AM_PM to AM or PM. If you try to parse a PM date, use PM here.
scheduleCalendar.set(Calendar.AM_PM, Calendar.PM);
I'm having a difficult time understanding the concepts of .withFileNamePolicy of TextIO.write(). The requirements for supplying a FileNamePolicy seem incredibly complex for doing something as simple as specifying a GCS bucket to write streamed filed.
At a high level, I have JSON messages being streamed to a PubSub topic, and I'd like to write those raw messages to files in GCS for permanent storage (I'll also be doing other processing on the messages). I initially started with this Pipeline, thinking it would be pretty simple:
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().create();
Pipeline p = Pipeline.create(options);
p.apply("Read From PubSub", PubsubIO.readStrings().fromTopic(topic))
.apply("Write to GCS", TextIO.write().to(gcs_bucket);
p.run();
}
I got the error about needing WindowedWrites, which I applied, and then needing a FileNamePolicy. This is where things get hairy.
I went to the Beam docs and checked out FilenamePolicy. It looks like I would need to extend this class which then also require extending other abstract classes to make this work. Unfortunately the documentation on Apache is a bit scant and I can't find any examples for Dataflow 2.0 doing this, except for The Wordcount Example, which even then uses implements these details in a helper class.
So I could probably make this work just by copying much of the WordCount example, but I'm trying to better understand the details of this. A few questions I have:
1) Is there any roadmap item to abstract a lot of this complexity? It seems like I should be able to do supply a GCS bucket like I would in a nonWindowedWrite, and then just supply a few basic options like the timing and file naming rule. I know writing streaming windowed data to files is more complex than just opening a file pointer (or object storage equivalent).
2) It looks like to make this work, I need to create a WindowedContext object which requires supplying a BoundedWindow abstract class, and PaneInfo Object Class, and then some shard info. The information available for these is pretty bare and I'm having a hard time knowing what is actually needed for all of these, especially given my simple use case. Are there any good examples available that implement these? In addition, it also looks like I need the set the # of shards as part of TextIO.write, but then also supply # shards as part of the fileNamePolicy?
Thanks for anything in helping me understand the details behind this, hoping to learn a few things!
Edit 7/20/17
So I finally got this pipeline to run with extending the FilenamePolicy. My challenge was needing to define the window of the streaming data from PubSub. Here is a pretty close representation of the code:
public class ReadData {
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().create();
Pipeline p = Pipeline.create(options);
p.apply("Read From PubSub", PubsubIO.readStrings().fromTopic(topic))
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))))
.apply("Write to GCS", TextIO.write().to("gcs_bucket")
.withWindowedWrites()
.withFilenamePolicy(new TestPolicy())
.withNumShards(10));
p.run();
}
}
class TestPolicy extends FileBasedSink.FilenamePolicy {
#Override
public ResourceId windowedFilename(
ResourceId outputDirectory, WindowedContext context, String extension) {
IntervalWindow window = (IntervalWindow) context.getWindow();
String filename = String.format(
"%s-%s-%s-%s-of-%s.json",
"test",
window.start().toString(),
window.end().toString(),
context.getShardNumber(),
context.getShardNumber()
);
return outputDirectory.resolve(filename, ResolveOptions.StandardResolveOptions.RESOLVE_FILE);
}
#Override
public ResourceId unwindowedFilename(
ResourceId outputDirectory, Context context, String extension) {
throw new UnsupportedOperationException("Unsupported.");
}
}
In Beam 2.0, the below is an example of writing the raw messages from PubSub out into windowed files on GCS. The pipeline is fairly configurable, allowing you to specify the window duration via a parameter and a sub directory policy if you want logical subsections of your data for ease of reprocessing / archiving. Note that this has an additional dependency on Apache Commons Lang 3.
PubSubToGcs
/**
* This pipeline ingests incoming data from a Cloud Pub/Sub topic and
* outputs the raw data into windowed files at the specified output
* directory.
*/
public class PubsubToGcs {
/**
* Options supported by the pipeline.
*
* <p>Inherits standard configuration options.</p>
*/
public static interface Options extends DataflowPipelineOptions, StreamingOptions {
#Description("The Cloud Pub/Sub topic to read from.")
#Required
ValueProvider<String> getTopic();
void setTopic(ValueProvider<String> value);
#Description("The directory to output files to. Must end with a slash.")
#Required
ValueProvider<String> getOutputDirectory();
void setOutputDirectory(ValueProvider<String> value);
#Description("The filename prefix of the files to write to.")
#Default.String("output")
#Required
ValueProvider<String> getOutputFilenamePrefix();
void setOutputFilenamePrefix(ValueProvider<String> value);
#Description("The shard template of the output file. Specified as repeating sequences "
+ "of the letters 'S' or 'N' (example: SSS-NNN). These are replaced with the "
+ "shard number, or number of shards respectively")
#Default.String("")
ValueProvider<String> getShardTemplate();
void setShardTemplate(ValueProvider<String> value);
#Description("The suffix of the files to write.")
#Default.String("")
ValueProvider<String> getOutputFilenameSuffix();
void setOutputFilenameSuffix(ValueProvider<String> value);
#Description("The sub-directory policy which files will use when output per window.")
#Default.Enum("NONE")
SubDirectoryPolicy getSubDirectoryPolicy();
void setSubDirectoryPolicy(SubDirectoryPolicy value);
#Description("The window duration in which data will be written. Defaults to 5m. "
+ "Allowed formats are: "
+ "Ns (for seconds, example: 5s), "
+ "Nm (for minutes, example: 12m), "
+ "Nh (for hours, example: 2h).")
#Default.String("5m")
String getWindowDuration();
void setWindowDuration(String value);
#Description("The maximum number of output shards produced when writing.")
#Default.Integer(10)
Integer getNumShards();
void setNumShards(Integer value);
}
/**
* Main entry point for executing the pipeline.
* #param args The command-line arguments to the pipeline.
*/
public static void main(String[] args) {
Options options = PipelineOptionsFactory
.fromArgs(args)
.withValidation()
.as(Options.class);
run(options);
}
/**
* Runs the pipeline with the supplied options.
*
* #param options The execution parameters to the pipeline.
* #return The result of the pipeline execution.
*/
public static PipelineResult run(Options options) {
// Create the pipeline
Pipeline pipeline = Pipeline.create(options);
/**
* Steps:
* 1) Read string messages from PubSub
* 2) Window the messages into minute intervals specified by the executor.
* 3) Output the windowed files to GCS
*/
pipeline
.apply("Read PubSub Events",
PubsubIO
.readStrings()
.fromTopic(options.getTopic()))
.apply(options.getWindowDuration() + " Window",
Window
.into(FixedWindows.of(parseDuration(options.getWindowDuration()))))
.apply("Write File(s)",
TextIO
.write()
.withWindowedWrites()
.withNumShards(options.getNumShards())
.to(options.getOutputDirectory())
.withFilenamePolicy(
new WindowedFilenamePolicy(
options.getOutputFilenamePrefix(),
options.getShardTemplate(),
options.getOutputFilenameSuffix())
.withSubDirectoryPolicy(options.getSubDirectoryPolicy())));
// Execute the pipeline and return the result.
PipelineResult result = pipeline.run();
return result;
}
/**
* Parses a duration from a period formatted string. Values
* are accepted in the following formats:
* <p>
* Ns - Seconds. Example: 5s<br>
* Nm - Minutes. Example: 13m<br>
* Nh - Hours. Example: 2h
*
* <pre>
* parseDuration(null) = NullPointerException()
* parseDuration("") = Duration.standardSeconds(0)
* parseDuration("2s") = Duration.standardSeconds(2)
* parseDuration("5m") = Duration.standardMinutes(5)
* parseDuration("3h") = Duration.standardHours(3)
* </pre>
*
* #param value The period value to parse.
* #return The {#link Duration} parsed from the supplied period string.
*/
private static Duration parseDuration(String value) {
Preconditions.checkNotNull(value, "The specified duration must be a non-null value!");
PeriodParser parser = new PeriodFormatterBuilder()
.appendSeconds().appendSuffix("s")
.appendMinutes().appendSuffix("m")
.appendHours().appendSuffix("h")
.toParser();
MutablePeriod period = new MutablePeriod();
parser.parseInto(period, value, 0, Locale.getDefault());
Duration duration = period.toDurationFrom(new DateTime(0));
return duration;
}
}
WindowedFilenamePolicy
/**
* The {#link WindowedFilenamePolicy} class will output files
* to the specified location with a format of output-yyyyMMdd'T'HHmmssZ-001-of-100.txt.
*/
#SuppressWarnings("serial")
public class WindowedFilenamePolicy extends FilenamePolicy {
/**
* Possible sub-directory creation modes.
*/
public static enum SubDirectoryPolicy {
NONE("."),
PER_HOUR("yyyy-MM-dd/HH"),
PER_DAY("yyyy-MM-dd");
private final String subDirectoryPattern;
private SubDirectoryPolicy(String subDirectoryPattern) {
this.subDirectoryPattern = subDirectoryPattern;
}
public String getSubDirectoryPattern() {
return subDirectoryPattern;
}
public String format(Instant instant) {
DateTimeFormatter formatter = DateTimeFormat.forPattern(subDirectoryPattern);
return formatter.print(instant);
}
}
/**
* The formatter used to format the window timestamp for outputting to the filename.
*/
private static final DateTimeFormatter formatter = ISODateTimeFormat
.basicDateTimeNoMillis()
.withZone(DateTimeZone.getDefault());
/**
* The filename prefix.
*/
private final ValueProvider<String> prefix;
/**
* The filenmae suffix.
*/
private final ValueProvider<String> suffix;
/**
* The shard template used during file formatting.
*/
private final ValueProvider<String> shardTemplate;
/**
* The policy which dictates when or if sub-directories are created
* for the windowed file output.
*/
private ValueProvider<SubDirectoryPolicy> subDirectoryPolicy = StaticValueProvider.of(SubDirectoryPolicy.NONE);
/**
* Constructs a new {#link WindowedFilenamePolicy} with the
* supplied prefix used for output files.
*
* #param prefix The prefix to append to all files output by the policy.
* #param shardTemplate The template used to create uniquely named sharded files.
* #param suffix The suffix to append to all files output by the policy.
*/
public WindowedFilenamePolicy(String prefix, String shardTemplate, String suffix) {
this(StaticValueProvider.of(prefix),
StaticValueProvider.of(shardTemplate),
StaticValueProvider.of(suffix));
}
/**
* Constructs a new {#link WindowedFilenamePolicy} with the
* supplied prefix used for output files.
*
* #param prefix The prefix to append to all files output by the policy.
* #param shardTemplate The template used to create uniquely named sharded files.
* #param suffix The suffix to append to all files output by the policy.
*/
public WindowedFilenamePolicy(
ValueProvider<String> prefix,
ValueProvider<String> shardTemplate,
ValueProvider<String> suffix) {
this.prefix = prefix;
this.shardTemplate = shardTemplate;
this.suffix = suffix;
}
/**
* The subdirectory policy will create sub-directories on the
* filesystem based on the window which has fired.
*
* #param policy The subdirectory policy to apply.
* #return The filename policy instance.
*/
public WindowedFilenamePolicy withSubDirectoryPolicy(SubDirectoryPolicy policy) {
return withSubDirectoryPolicy(StaticValueProvider.of(policy));
}
/**
* The subdirectory policy will create sub-directories on the
* filesystem based on the window which has fired.
*
* #param policy The subdirectory policy to apply.
* #return The filename policy instance.
*/
public WindowedFilenamePolicy withSubDirectoryPolicy(ValueProvider<SubDirectoryPolicy> policy) {
this.subDirectoryPolicy = policy;
return this;
}
/**
* The windowed filename method will construct filenames per window in the
* format of output-yyyyMMdd'T'HHmmss-001-of-100.txt.
*/
#Override
public ResourceId windowedFilename(ResourceId outputDirectory, WindowedContext c, String extension) {
Instant windowInstant = c.getWindow().maxTimestamp();
String datetimeStr = formatter.print(windowInstant.toDateTime());
// Remove the prefix when it is null so we don't append the literal 'null'
// to the start of the filename
String filenamePrefix = prefix.get() == null ? datetimeStr : prefix.get() + "-" + datetimeStr;
String filename = DefaultFilenamePolicy.constructName(
filenamePrefix,
shardTemplate.get(),
StringUtils.defaultIfBlank(suffix.get(), extension), // Ignore the extension in favor of the suffix.
c.getShardNumber(),
c.getNumShards());
String subDirectory = subDirectoryPolicy.get().format(windowInstant);
return outputDirectory
.resolve(subDirectory, StandardResolveOptions.RESOLVE_DIRECTORY)
.resolve(filename, StandardResolveOptions.RESOLVE_FILE);
}
/**
* Unwindowed writes are unsupported by this filename policy so an {#link UnsupportedOperationException}
* will be thrown if invoked.
*/
#Override
public ResourceId unwindowedFilename(ResourceId outputDirectory, Context c, String extension) {
throw new UnsupportedOperationException("There is no windowed filename policy for unwindowed file"
+ " output. Please use the WindowedFilenamePolicy with windowed writes or switch filename policies.");
}
}
In Beam currently the DefaultFilenamePolicy supports windowed writes, so there's no need to write a custom FilenamePolicy. You can control the output filename by putting W and P placeholders (for the window and pane respectively) in the filename template. This exists in the head beam repository, and will also be in the upcoming Beam 2.1 release (which is being released as we speak).
I want to test the behavior of a private method.
The method "moveDataToArchive" does 4 steps.
It's 4x: calculate a date + call a sub method.
This is my test:
#Test
public void testMoveData2Archive() throws Exception{
final long now = 123456789000L;
//Necessary to make the archivingBean runable.
Vector<LogEntry> logCollector = new Vector<LogEntry>();
Deencapsulation.setField(archivingBean, "logCollector", logCollector);
new NonStrictExpectations(archivingBean) {
{ //Lets fake the DB stuff.
invoke(archivingBean, "getConnection");result = connection;
connection.prepareStatement(anyString); result = prepStatement;
prepStatement.executeUpdate(); returns(Integer.valueOf(3), Integer.valueOf(0), Integer.valueOf(3));
}
};
new NonStrictExpectations(props) {
{ //This is important. The numbers will be used for one of each 4 submethods
props.getProperty(ArchivingHandlerBean.ARCHIVING_CREDMATURITY_OVER_IN_DAYS); result = "160";
props.getProperty(ArchivingHandlerBean.ARCHIVING_CREDHIST_AGE_IN_DAYS); result = "150";
props.getProperty(ArchivingHandlerBean.ARCHIVING_DEBTHIST_AGE_IN_DAYS); result = "140";
props.getProperty(ArchivingHandlerBean.ARCHIVING_LOG_AGE_IN_DAYS); result = "130";
}
};
new Expectations() {
{
Date expected = new Date(now - (160 * 24 * 60 * 60 * 1000));
invoke(archivingBean, "moveCreditBasic2Archive", expected);
expected = new Date(now - (150 * 24 * 60 * 60 * 1000));
invoke(archivingBean, "moveCreditHistory2Archive", expected);
expected = new Date(now - (999 * 24 * 60 * 60 * 1000));
invoke(archivingBean, "moveDebtorHistory2Archive", expected);
expected = new Date(now - (130 * 24 * 60 * 60 * 1000));
invoke(archivingBean, "moveLog2Archive", expected);
}
};
Calendar cal = Calendar.getInstance();
cal.setTimeInMillis(now);
Deencapsulation.invoke(archivingBean,"moveDataToArchive",cal, props);
}
Whats the problem? See the third expected date. It is wrong! (999 instead of 140).
I also changed the order of the calls. I even made those private method public and tried it. All those changes did not change the outcome: Test is green.
What is wrong here? Why is the test green?
The test is misusing the mocking API, by mixing strict and non-strict expectations for the same mock (archivingBean). The first expectations recorded on this mock are non-strict, so JMockit regards it as a non-strict mock for the whole test.
The correct way to write the test would be to turn the strict expectation block (the one with the 4 calls to "sub methods") into a verification block at the end of the test.
(As an aside, the whole test has several problems. 1) In general, private methods should be tested indirectly, through some public method. 2) Also, private methods should not be mocked, unless there is a strong reason otherwise - in this case, I would probably write a test which verifies the actual contents of the output file. 3) Don't mock things unnecessarily, such as props - props.setProperty could be used instead, I suppose. 4) Use auto-boxing - Integer.valueOf(3) -> 3).
#Rogério:
Your assumptions do not work completely. i.e. I don't have a setProperty(). What I tried is to use a Verifications-Block.
Sadly I dont understand JMockit good enough to get it running...
I did 2 things. First I tried to mock the 4 private methods. I only want to see if they are called. But I don't want there logic to run.
I tried it by extending the first NonStrictExpectations-Block like this:
new NonStrictExpectations(archivingBean) {
{
invoke(archivingBean, "getConnection");result = connection;
connection.prepareStatement(anyString); result = prepStatement;
prepStatement.executeUpdate(); returns(Integer.valueOf(3), Integer.valueOf(0), Integer.valueOf(3));
//New part
invoke(archivingBean, "moveCreditBasic2Archive", withAny(new Date()));
invoke(archivingBean, "moveCreditHistory2Archive", withAny(new Date()));
invoke(archivingBean, "moveDebtorHistory2Archive", withAny(new Date()));
invoke(archivingBean, "moveLog2Archive", withAny(new Date()));
}
};
On the other hand I moved the Expectations-Block down and made it a verifications Block. Now the JUnit fails with a
mockit.internal.MissingInvocation: Missing invocation of:
de.lpm.ejb.archiving.ArchivingHandlerBean#moveCreditBasic2Archive(java.util.Date pOlderThan)
with arguments: Tue Feb 03 03:39:51 CET 2009
on mock instance: de.lpm.ejb.archiving.ArchivingHandlerBean#1601bde
at de.lpm.ejb.archiving.ArchivingHandlerBean.moveCreditBasic2Archive(ArchivingHandlerBean.java:175)
[...]
Caused by: Missing invocation
This is Line 170-175 in ArchivingHandlerBean.java:
170: Connection connection = getConnection();
171: SQLService service = new SQLService(connection);
172:
173: PreparedStatement prepStmtMove = null;
174:
175: Vector<HashMap<String, String>> where_clauses = new Vector<HashMap<String,String>>();
I just want to verify that the 4 private methods are executed with the right date.