Dynamically push events/values to a Flux during application run - java

I am trying to make a reactive pipeline using Java and project-reactor where the use-case is that the application generates flow status(INIT, PROCESSING, SAVED, DONE) at different levels. The status must be emitted asynchronously to a flux which is needed to be handled independently and separately from the main flow. I came across this link:
Spring WebFlux (Flux): how to publish dynamically
My sample flow is something like this:
public class StatusEmitterImpl implements StatusEmitter {
private final FluxProcessor<String, String> processor;
private final FluxSink<String> sink;
public StatusEmitterImpl() {
this.processor = DirectProcessor.<String>create().serialize();
this.sink = processor.sink();
}
#Override
public Flux<String> publisher() {
return this.processor.map(x -> x);
}
public void publishStatus(String status) {
sink.next(status);
}
}
public class Try {
public static void main(String[] args) {
StatusEmitterImpl statusEmitter = new StatusEmitterImpl();
Flux.fromIterable(Arrays.asList("INIT", "DONE")).subscribe(x ->
statusEmitter.publishStatus(x));
statusEmitter.publisher().subscribe(x -> System.out.println(x));
}
}
The problem is that nothing is getting printed on the console. I cannot understand what I am missing.

DirectProcessor passes values to its registered Subscribers directly, without caching the signals. If there is no Subscriber, then the value is "forgotten". If a Subscriber comes in late, then it will only receive signals emitted after it subscribed.
That's what is happening here: because fromIterable works on an in-memory collection, it has time to push all values to the DirectProcessor, which by that time doesn't have a registered Subscriber yet.
If you invert the last two lines you should see something.

The DirectProcessor is hot publishers and don't buffer element,so you should produce element after its subscribe.like is
public static void main(String[] args) {
StatusEmitterImpl statusEmitter = new StatusEmitterImpl();
statusEmitter.publisherA().subscribe(x -> System.out.println(x));
Flux.fromIterable(Arrays.asList("INIT", "DONE")).subscribe(x -> statusEmitter.publishStatus(x));
}
, or use EmitterProcessor,UnicastProcessor instand of DirectProcessor.

Related

How to use non-keyed state with Kafka Consumer in Flink?

I'm trying to implement (just starting work with Java and Flink) a non-keyed state in KafkaConsumer object, since in this stage no keyBy() in called. This object is the front end and the first module to handle messages from Kafka.
SourceOutput is a proto file representing the message.
I have the KafkaConsumer object :
public class KafkaSourceFunction extends ProcessFunction<byte[], SourceOutput> implements Serializable
{
#Override
public void processElement(byte[] bytes, ProcessFunction<byte[], SourceOutput>.Context
context, Collector<SourceOutput> collector) throws Exception
{
// Here, I want to call to sorting method
collector.collect(output);
}
}
I have an object (KafkaSourceSort) that do all the sorting and should keep the unordered message in priorityQ in the state and also responsible to deliver the message if it comes in the right order thru the collector.
class SessionInfo
{
public PriorityQueue<SourceOutput> orderedMessages = null;
public void putMessage(SourceOutput Msg)
{
if(orderedMessages == null)
orderedMessages = new PriorityQueue<SourceOutput>(new SequenceComparator());
orderedMessages.add(Msg);
}
}
public class KafkaSourceState implements Serializable
{
public TreeMap<String, SessionInfo> Sessions = new TreeMap<>();
}
I read that I need to use a non-keyed state (ListState) which should contain a map of sessions while each session contains a priorityQ holding all messages related to this session.
I found an example so I implement this:
public class KafkaSourceSort implements SinkFunction<KafkaSourceSort>,
CheckpointedFunction
{
private transient ListState<KafkaSourceState> checkpointedState;
private KafkaSourceState state;
#Override
public void snapshotState(FunctionSnapshotContext functionSnapshotContext) throws Exception
{
checkpointedState.clear();
checkpointedState.add(state);
}
#Override
public void initializeState(FunctionInitializationContext context) throws Exception
{
ListStateDescriptor<KafkaSourceState> descriptor =
new ListStateDescriptor<KafkaSourceState>(
"KafkaSourceState",
TypeInformation.of(new TypeHint<KafkaSourceState>() {}));
checkpointedState = context.getOperatorStateStore().getListState(descriptor);
if (context.isRestored())
{
state = (KafkaSourceState) checkpointedState.get();
}
}
#Override
public void invoke(KafkaSourceState value, SinkFunction.Context contex) throws Exception
{
state = value;
// ...
}
}
I see that I need to implement an invoke message which probably will be called from processElement() but the signature of invoke() doesn't contain the collector and I don't understand how to do so or even if I did OK till now.
Please, a help will be appreciated.
Thanks.
A SinkFunction is a terminal node in the DAG that is your job graph. It doesn't have a Collector in its interface because it cannot emit anything downstream. It is expected to connect to an external service or data store and send data there.
If you share more about what you are trying to accomplish perhaps we can offer more assistance. There may be an easier way to go about this.

Spring Cloud Stream Reactive Listener Without Output

I'm using Reactive Spring Cloud Stream and I'm having trouble creating a StreamListener without an Output. The following code works as long as no malformed messages are received. When a malformed message is received, the flux closes.
#StreamListener
public void handleMessage(#Input(MessagingConfig.INPUT) Flux<String> payloads) {
payloads.flatMap(objectToSave -> reactiveMongoTemplate.insert(objectToSave)).subscribe();
}
If I understand correctly, it is preferable to let the framework subscribe to the flux instead of subscribing to it manually. This isn't a problem when a listener has an output, because I can simply return the flux like so:
#StreamListener
#Output(MessagingConfig.OUTPUT)
public Flux<String> handleMessage(#Input(MessagingConfig.INPUT) Flux<String> payloads) {
return payloads.flatMap(objectToSave -> reactiveMongoTemplate.insert(objectToSave));
}
The framework seems to handle bad messages in a way that doesn't close the flux when it is returned. Is there any way to let the framework handle the flux when the listener doesn't specify an output?
Consider switching to using Spring Cloud Function (SCF) programming model which we have recently adopted.
Basically, as long as you have the latest code base (2.1.0.RC4 is the latest and RELEASE is few days away) you're fine. Here is the example of your code using SCF programming model:
#SpringBootApplication
#EnableBinding(Sink.class)
public class SampleReactiveConsumer {
public static void main(String[] args) {
SpringApplication.run(SampleReactiveConsumer.class,
"--spring.cloud.stream.function.definition=consume");
}
#Bean
public Consumer<Flux<String>> consume(){
return payloads -> payloads.flatMap(objectToSave -> reactiveMongoTemplate.insert(objectToSave)).subscribe();
}
}
You can also remove reactive module from your classpath as we're also considering deprecating it all together
If you really want to avoid SCF mentioned in Oleg's answer you could try below, hacky approach.
const val IN = "input"
const val OUT = "dummy-output"
interface Channels {
#Input(IN)
fun input(): MessageChannel
#Output(OUT)
fun output(): MessageChannel
}
#EnableBinding(Channels::class)
class MsgList {
#StreamListener
#Output(OUT)
fun receive(#Input(IN) messages: Flux<String>): Flux<Void> {
return messages
.doOnNext { if (it == "err") throw IllegalStateException("err") }
.doOnNext { println(it) }
.flatMap { Mono.empty<Void>() }
}
}
Output binding will be created but no messages will go through. In case of RabbitMQ that means - dummy exchange will appear, but queue won't get created.
Also errors would be handled as you expected. With above example, you may send 3 messages, "ok", "err", "ok2", and you will see "ok", then exception, then "ok2" on the screen. An "ok2" and any subsequent valid message will be handled properly.

How to create some sort of event framework in java?

I don't have a GUI (my classes are part of a Minecraft Mod). I wanted to be able to mimic C# event framework: A class declares events and lets others subscribe to them.
My first approach was to create a class called EventArgs and then do something like this:
public class EventArgs
{
public boolean handled;
}
#FunctionalInterface
public interface IEventHandler<TEvtArgs extends EventArgs>
{
public void handle(Object source, TEvtArgs args);
}
public class Event<TEvtArgs extends EventArgs>
{
private final Object owner;
private final LinkedList<IEventHandler<TEvtArgs>> handlers = new LinkedList<>();
public Event(Object owner)
{
this.owner = owner;
}
public void subscribe(IEventHandler<TEvtArgs> handler)
{
handlers.add(handler);
}
public void unsubscribe(IEventHandler<TEvtArgs> handler)
{
while(handlers.remove(handler));
}
public void raise(TEvtArgs args)
{
for(IEventHandler<TEvtArgs> handler : handlers)
{
handler.handle(owner, args);
if(args.handled)
break;
}
}
}
Then a class would do something like this:
public class PropertyChangedEvtArgs extends EventArgs
{
public final Object oldValue;
public final Object newValue;
public PropertyChangedEvtArgs(final Object oldValue, final Object newValue)
{
this.oldValue = oldValue;
this.newValue = newValue;
}
}
public class SomeEventPublisher
{
private int property = 0;
private final Random rnd = new Random();
public final Event<PropertyChangedEvtArgs> PropertyChanged = new Event<>(this);
public void raiseEventOrNot(int value)
{
if(rnd.nextBoolean())//just to represent the fact that the event is not always raised
{
int old = property;
property = value;
PropertyChanged.raise(new PropertyChangedEvtArgs("old(" + old + ")", "new(" + value + ")"));
}
}
}
public class SomeSubscriber
{
private final SomeEventPublisher eventPublisher = new SomeEventPublisher();
public SomeSubscriber()
{
eventPublisher.PropertyChanged.subscribe(this::handlePropertyAChanges);
}
private void handlePropertyAChanges(Object source, PropertyChangedEvtArgs args)
{
System.out.println("old:" + args.oldValue);
System.out.println("new:" + args.newValue + "\n");
}
public void someMethod(int i)
{
eventPublisher.raiseEventOrNot(i);
}
}
public class Main
{
private static final SomeSubscriber subscriber = new SomeSubscriber();
public static void main(String[] args)
{
for(int i = 0; i < 10; ++i)
{
subscriber.someMethod(i);
}
}
}
The biggest problem with this naïve approach is that it breaks proper encapsullation by exposing raise as public. I can't see a way around it, and maybe my whole pattern is wrong. I would like some ideas.
There's also a related problem: I would like the events to be raised immediately after the method raising them returns. Is there a way to synchronize this using threads or some other construct? The caller code, of course, can't be involved in the task of synchronization. It has to be completely transparent to it.
The best thing to do here is to avoid implementing your own event framework in the first place, and instead rely on some existing library. Out of the box Java provides EventListener, and at a minimum you can follow the patterns documented there. Even for non-GUI applications most of this advice applies.
Going beyond the JDK Guava provides several possible options, depending on your exact use case.
The most likely candidate is EventBus, which:
allows publish-subscribe-style communication between components without requiring the components to explicitly register with one another (and thus be aware of each other).
Or ListenableFuture (and ListeningExecutorService) which:
allows you to register callbacks to be executed once [a task submitted to an Executor] is complete, or if the computation is already complete, immediately. This simple addition makes it possible to efficiently support many operations that the basic Future interface cannot support.
Or the Service API which:
represents an object with an operational state, with methods to start and stop. For example, webservers, RPC servers, and timers can implement the Service interface. Managing the state of services like these, which require proper startup and shutdown management, can be nontrivial, especially if multiple threads or scheduling is involved.
This API similarly lets you register listeners to respond to state changes in your services.
Even if none of these options directly work for your use case, take a look at Guava's source code for examples of event-driven behavior and listeners you can try to emulate.

Join two streams using a count-based window

I am new to Flink Streaming API and I want to complete the following simple (IMO) task. I have two streams and I want to join them using count-based windows. The code I have so far is the following:
public class BaselineCategoryEquiJoin {
private static final String recordFile = "some_file.txt";
private static class ParseRecordFunction implements MapFunction<String, Tuple2<String[], MyRecord>> {
public Tuple2<String[], MyRecord> map(String s) throws Exception {
MyRecord myRecord = parse(s);
return new Tuple2<String[], myRecord>(myRecord.attributes, myRecord);
}
}
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment environment = StreamExecutionEnvironment.createLocalEnvironment();
ExecutionConfig config = environment.getConfig();
config.setParallelism(8);
DataStream<Tuple2<String[], MyRecord>> dataStream = environment.readTextFile(recordFile)
.map(new ParseRecordFunction());
DataStream<Tuple2<String[], MyRecord>> dataStream1 = environment.readTextFile(recordFile)
.map(new ParseRecordFunction());
DataStreamSink<Tuple2<String[], String[]>> joinedStream = dataStream1
.join(dataStream)
.where(new KeySelector<Tuple2<String[],MyRecord>, String[]>() {
public String[] getKey(Tuple2<String[], MyRecord> recordTuple2) throws Exception {
return recordTuple2.f0;
}
}).equalTo(new KeySelector<Tuple2<String[], MyRecord>, String[]>() {
public String[] getKey(Tuple2<String[], MyRecord> recordTuple2) throws Exception {
return recordTuple2.f0;
}
}).window(TumblingProcessingTimeWindows.of(Time.seconds(1)))
.apply(new JoinFunction<Tuple2<String[],MyRecord>, Tuple2<String[],MyRecord>, Tuple2<String[], String[]>>() {
public Tuple2<String[], String[]> join(Tuple2<String[], MyRecord> tuple1, Tuple2<String[], MyRecord> tuple2) throws Exception {
return new Tuple2<String[], String[]>(tuple1.f0, tuple1.f0);
}
}).print();
environment.execute();
}
}
My code works without errors, but it does not produce any results. In fact, the call to apply method is never called (verified by adding a breakpoint on debug mode). I think, the main reason for the previous is that my data do not have a time attribute. Therefore, windowing (materialized through window) is not done properly. Therefore, my question is how can I indicate that I want my join to take place based on count-windows. For instance, I want the join to materialize every 100 tuples from each stream. Is the previous feasible in Flink? If yes, what should I change in my code to achieve it.
At this point, I have to inform you that I tried to call the countWindow() method, but for some reason it is not offered by Flink's JoinedStreams.
Thank you
Count-based joins are not supported. You could emulate count-based windows, by using "event-time" semantics and apply a unique seq-id as timestamp to each record. Thus, a time-window of "5" would be effectively a count-window of 5.

How to pass a countdown latch to an Apache Storm/Trident Filter without incurring a not-serializable exception

I'm trying to create some tests to verify data going through an Apache Storm topology (using the Trident API)
I've created this simple filter to access callbacks:
public class CallbackFilter extends BaseFilter {
private final TupleCallback callback;
public CallbackFilter(TupleCallback callback) {
this.callback = callback;
}
#Override
public boolean isKeep(TridentTuple tuple) {
if (callback != null) {
callback.callback(tuple);
}
return true;
}
public interface TupleCallback extends Serializable{
void callback(TridentTuple tuple);
}
}
If I try this, I get a runtime exception saying CountdownLatch is not serializable:
#Test
public void testState() throws Exception {
CountDownLatch latch = new CountDownLatch(4);
TridentTopology tridentTopology = new TridentTopology();
FeederBatchSpout spout = ...
TridentState state = ...
// problematic code:
CallbackFilter.TupleCallback callback = (CallbackFilter.TupleCallback & Serializable) tuple -> {
System.out.println("tuple = " + tuple);
latch.countDown(); //latch is not serializable - exception!
};
CallbackFilter latchFilter = new CallbackFilter(callback);
tridentTopology.stuff()
.each(new Fields("foo", "bar"), latchFilter);
...
So it appears Storm is serializing all of the components of a topology and then submitting them in the serialized form, probably for clustering or whatnot.
Is there any way of getting a callback from Storm to the calling test? Maybe some sort of test mode that doesn't serialize the topology? It's kinda hard to see what is going on inside the topology from a test point of view, especially at each stage of a topology.
update:
even doing something like this doesn't work!
List<TridentTuple> tupleList = new ArrayList<>();
CallbackFilter.TupleCallback callback = (CallbackFilter.TupleCallback & Serializable) tuple -> {
tupleList.add(tuple);
};
I see the tupleList being added to in the debugger, but in the space of the test, the list stays zero. It's like the topology is running in its own JVM.

Categories