Reactor choosing a sink/processor - java

I have the following use case:
Out of an application I am consuming with X threads some messages, where I have a Consumer implementation defined like that:
public interface Consumer {
onMessage(Object message);
}
The problem is that Consumer is not a different instance per thread, but a single instance, as it is a Spring bean and we also expect it not to have side effects per single call of onMessage.
However, what I want to build is a duplicate message detection mechanism, which kind of looks like this:
public static <T> Flux<OcurrenceCache<T>> getExceedingRates(Flux<T> values, int maxHits, int bufferSize, Duration bufferTimeout) {
return values.bufferTimeout(bufferSize, bufferTimeout)
.map(vals -> {
OcurrenceCache<T> occurrenceCache = new OcurrenceCache<>(maxHits);
for (T value : vals) {
occurrenceCache.incrementNrOccurrences(value);
}
return occurrenceCache;
});
}
Where basically from a Flux of values I am returning an occurrence cache with the elements that are encountered more than the max desired number of hits.
Naively, I can implement things like that:
public class MyConsumer implements Consumer {
private final EmitterProcessor<Object> emitterProcessor;
public MyConsumer(Integer maxHits, Integer bufferSize, Long timeoutMillis){
this.emitterProcessor = EmitterProcessor.create();
this.emitterProcessor
.bufferTimeout(bufferSize, Duration.ofMillis(timeoutMillis))
.subscribe(integers -> {
getExceedingRates(Flux.fromIterable(integers), maxHits, bufferSize, Duration.ofMillis(timeoutMillis))
.subscribe(integerOcurrenceCache -> {
System.out.println(integerOcurrenceCache.getExceedingValues());
});
});
}
#Override
public void onMessage(Object message){
emitterProcessor.onNext(message);
}
}
However, this is far from optimal because I know that my messages from a specific thread will NEVER contain any the messages that came from another thread (they are pre-grouped as we use jms grouping and kinesis sharding). So, in a way, I'd like to use such a Processor that will:
use the very same thread on which onMessage was called to kind of isolate the flux in such a way where values from it are isolated and not mixed up with the variables put from another thread.

You can use thread local processors:
private final ThreadLocal<EmitterProcessor<Object>> emitterProcessorHolder = ThreadLocal.withInitial(() -> {
EmitterProcessor<Object> processor = ...
return processor;
});
...
#Override
public void onMessage(Object message){
emitterProcessorHolder.get().onNext(message);
}

Related

How to create pool of clients which can handle just one task at once

My application starts couple of clients which communicate with steam. There are two types of task which I can ask for clients. One when I don't care about blocking for example ask client about your friends. But second there are tasks which I can submit just one to client and I need to wait when he finished it asynchronously. So I am not sure if there is already some design pattern but you can see what I already tried. When I ask for second task I removed it from queue and return it here after this task is done. But I don't know if this is good sollution because I can 'lost' some clients when I do something wrong
#Component
public class SteamClientWrapper {
private Queue<DotaClientImpl> clients = new LinkedList<>();
private final Object clientLock = new Object();
public SteamClientWrapper() {
}
#PostConstruct
public void init() {
// starting clients here clients.add();
}
public DotaClientImpl getClient() {
return getClient(false);
}
public DotaClientImpl getClient(boolean freeLast) {
synchronized (clients) {
if (!clients.isEmpty()) {
return freeLast ? clients.poll() : clients.peek();
}
}
return null;
}
public void postClient(DotaClientImpl client) {
if (client == null) {
return;
}
synchronized (clientLock) {
clients.offer(client);
clientLock.notify();
}
}
public void doSomethingBlocking() {
DotaClientImpl client = getClient(true);
client.doSomething();
}
}
Sounds like you could use Spring's ThreadPoolTaskExecutor to do that.
An Executor is basically what you tried to do - store tasks in a queue and process the next as soon the previous has finished.
Often this is used to run tasks in parallel, but it can also reduce overhead for serial processing.
A sample doing it this way would be on
https://dzone.com/articles/spring-and-threads-taskexecutor
To ensure only one client task runs at a time, simply set the configuration to
executor.setCorePoolSize(1);
executor.setMaxPoolSize(1);

Kafka SpringBoot StreamListener - how to consume multiple topics in order?

I have multiple StreamListener-annotated methods consuming from different topics. But some of these topics need to be read from the "earliest" offset to populate an in-memory map (something like a state machine) and then consume from other topics that might have commands in them that should be executed against the "latest" state machine.
Current code looks something like:
#Component
#AllArgsConstructor
#EnableBinding({InputChannel.class, OutputChannel.class})
#Slf4j
public class KafkaListener {
#StreamListener(target = InputChannel.EVENTS)
public void event(Event event) {
// do something with the event
}
#StreamListener(target = InputChannel.COMMANDS)
public void command(Command command) {
// do something with the command only after all events have been processed
}
}
I tried to add some horrible code that gets the kafka topic offset metadata from the incoming event messages and then uses a semaphore to block the command until a certain percentage of the total offset is reached by the event. It kinda works but makes me sad, and it will be awful to maintain once we have 20 or so topics that all depend on one another.
Does SpringBoot / Spring Streams have any built-in mechanism to do this, or is there some common pattern that people use that I'm not aware of?
TL;DR: How do I process all messages from topic A before consuming any from topic B, without doing something dirty like sticking a Thread.sleep(60000) in the consumer for topic B?
See the kafka consumer binding property resetOffsets
resetOffsets
Whether to reset offsets on the consumer to the value provided by startOffset. Must be false if a KafkaRebalanceListener is provided; see Using a KafkaRebalanceListener.
Default: false.
startOffset
The starting offset for new groups. Allowed values: earliest and latest. If the consumer group is set explicitly for the consumer 'binding' (through spring.cloud.stream.bindings..group), 'startOffset' is set to earliest. Otherwise, it is set to latest for the anonymous consumer group. Also see resetOffsets (earlier in this list).
Default: null (equivalent to earliest).
You can also add a KafkaBindingRebalanceListener and perform seeks on the consumer.
EDIT
You can also set autoStartup to false on the second listener, and start the binding when you are ready. Here's an example:
#SpringBootApplication
#EnableBinding(Sink.class)
public class Gitter55Application {
public static void main(String[] args) {
SpringApplication.run(Gitter55Application.class, args);
}
#Bean
public ConsumerEndpointCustomizer<KafkaMessageDrivenChannelAdapter<?, ?>> customizer() {
return (endpoint, dest, group) -> {
endpoint.setOnPartitionsAssignedSeekCallback((assignments, callback) -> {
assignments.keySet().forEach(tp -> callback.seekToBeginning(tp.topic(), tp.partition()));
});
};
}
#StreamListener(Sink.INPUT)
public void listen(String value, #Header(KafkaHeaders.RECEIVED_MESSAGE_KEY) byte[] key) {
System.out.println(new String(key) + ":" + value);
}
#Bean
public ApplicationRunner runner(KafkaTemplate<byte[], byte[]> template,
BindingsEndpoint bindings) {
return args -> {
while (true) {
template.send("gitter55", "foo".getBytes(), "bar".getBytes());
System.out.println("Hit enter to start");
System.in.read();
bindings.changeState("input", State.STARTED);
}
};
}
}
spring.cloud.stream.bindings.input.group=gitter55
spring.cloud.stream.bindings.input.destination=gitter55
spring.cloud.stream.bindings.input.content-type=text/plain
spring.cloud.stream.bindings.input.consumer.auto-startup=false

flink SourceFunction<> is being replaced in StreamExecutionEnvironment.addSource()?

I ran into this problem when I was trying to create a custom source of event. Which contains a queue that allow my other process to add items into it. Then expect my CEP pattern to print some debug messages when there is a match.
But there is no match no matter what I add to the queue. Then I notice that the queue inside mySource.run() is always empty. Which means the queue I used to create the mySource instance is not the same as the one inside StreamExecutionEnvironment. If I change the queue to static, force all instances to share the same queue, everything works as expected.
DummySource.java
public class DummySource implements SourceFunction<String> {
private static final long serialVersionUID = 3978123556403297086L;
// private static Queue<String> queue = new LinkedBlockingQueue<String>();
private Queue<String> queue;
private boolean cancel = false;
public void setQueue(Queue<String> q){
queue = q;
}
#Override
public void run(org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext<String> ctx)
throws Exception {
System.out.println("run");
synchronized (queue) {
while (!cancel) {
if (queue.peek() != null) {
String e = queue.poll();
if (e.equals("exit")) {
cancel();
}
System.out.println("collect "+e);
ctx.collectWithTimestamp(e, System.currentTimeMillis());
}
}
}
}
#Override
public void cancel() {
System.out.println("canceled");
cancel = true;
}
}
So I dig into the source code of StreamExecutionEnvironment. Inside the addSource() method. There is a clean() method which looks like it replaces the instance to a new one.
Returns a "closure-cleaned" version of the given function.
Why is that? and Why it needs to be serialize?
I've also try to turn off the clean closure using getConfig(). The result is still the same. My queue instance is not the same one which env is using.
How do I solve this problem?
The clean() method used on functions in Flink is mainly to ensure the Function(like SourceFunction, MapFunction) serialisable. Flink will serialise those functions and distribute them onto task nodes to execute them.
For simple variables in your Flink main code, like int, you can simply reference them in your function. But for the large or not-serialisable ones, better using broadcast and rich source function. Please refer to https://cwiki.apache.org/confluence/display/FLINK/Variables+Closures+vs.+Broadcast+Variables

When to use Akka and when not to?

I'm currently in the situation that I'm actually making things more complicated by using Actors then when I don't. I need to execute a lot of Http Requests without blocking the Main thread. Since this is concurrency and I wanted to try something different then locks, I decided to go with Akka. Now I'm in the situation that I'm doubting between two approaches.
Approach one (Create new Actors when it's in need):
public class Main {
public void start() {
ActorSystem system = ActorSystem.create();
// Create 5 Manager Actors (Currently the same Actor for all but this is different in actual practise)
ActorRef managers = system.actorOf(new BroadcastPool(5).props(Props.create(Actor.class)));
managers.tell(new Message(), ActorRef.noSender());
}
}
public class Actor extends UntypedActor {
#Override
public void onReceive(Object message) throws Exception {
if (message instanceof Message) {
ActorRef ref = getContext().actorOf(new SmallestMailboxPool(10).props(Props.create(Actor.class)));
// Repeat the below 10 times
ref.tell(new Message2(), getSelf());
} else if (message instanceof Message2) {
// Execute long running Http Request
}
}
}
public final class Message {
public Message() {
}
}
public final class Message2 {
public Message2() {
}
}
Approach two (Create a whole lot of actors before hand and hope it's enough):
public class Main {
public void start() {
ActorSystem system = ActorSystem.create();
ActorRef actors = system.actorOf(new SmallestMailboxPool(100).props(Props.create(Actor.class)));
ActorRef managers = system.actorOf(new BroadcastPool(5).props(Props.create(() -> new Manager(actors))));
managers.tell(new Message(), ActorRef.noSender());
}
}
public class Manager extends UntypedActor {
private ActorRef actors;
public Manager(ActorRef actors) {
this.actors = actors;
}
#Override
public void onReceive(Object message) throws Exception {
if (message instanceof Message) {
// Repeat 10 times
actors.tell(new Message2(), getSelf());
}
}
}
public class Actor extends UntypedActor {
#Override
public void onReceive(Object message) throws Exception {
if (message instanceof Message2) {
// Http request
}
}
}
public final class Message {
public Message() {
}
}
public final class Message2 {
public Message2() {
}
}
So both approaches have up and down sides. One makes sure it can always handle new requests coming in, those never have to wait. But it leaves behind a lot of Actors that are never gonna be used. Two on the hand reuses Actors but with the downside that it might not have enough of them and can't cope some time in the future and has to queue the messages.
What is the best approach of solving this and what is most common way people deal with this?
If you think I could be doing this sort of stuff a lot better (with or without Akka) please tell me! I'm pretty new to Akka and would love to learn more about it.
Based on the given information, it looks like a typical example for task-based concurrency -- not for actor-based concurrency. Imagine you have a method for doing the HTTP request. The method fetches the given URL and returns an object without causing any data races on shared memory:
private static Page loadPage(String url) {
// ...
}
You can easily fetch the pages concurrently with an Executor. There are different kinds of Executors, e.g. you can use one with a fixed number of threads.
public static void main(String... args) {
ExecutorService executor = Executors.newFixedThreadPool(5);
List<Future<Page>> futures = new ArrayList<>();
// submit tasks
for (String url : args) {
futures.add(executor.submit(() -> loadPage(url)));
}
// access result of tasks (or wait until it is available)
for (Future<Page> future : futures) {
Page page = future.get();
// ...
}
executor.shutdown();
}
There is no further synchronization required. The Executor framework takes care of that.
I'd use mixed approach: create relatively small pool of actors beforehand, increase it when needed, but keep pool's size limited (deny request when there are too many connections, to avoid crash due to out of memory).

Handling blocking operations in Play 2.1

I am trying to create a way to handle blocking operations in a specific way in Play. First I have described what my aim is followed by what I have managed so far. Can you please tell me if I am on the right track - if so, could you help me understand how to complete the code? If it is not the right way to do it could you suggest a better alternative?
Thanks a lot for all your help
Aim:
Would like to have all blocking operations sent to one thread to a separate thread to be handled asynchronously. New requests that come in are not to take up more threads but instead place them in a queue (or anything similar) to be handled by the single thread. For each item that is processed asynchronously by the extra thread, some text must be gathered and returned to the browser.
So after reading docs and SO questions it appears that actors must be used. I like the concept of actors but have never used them before so am still learning. This is what I have:
package models;
import java.io.*;
import play.mvc.*;
import play.libs.*;
import play.libs.F.*;
import akka.actor.*;
public class ActorTest extends UntypedActor {
static BufferedReader reader = new BufferedReader(new InputStreamReader(
System.in));
#Override
public void onReceive(Object message) throws Exception {
if (message instanceof String) {
getSender().tell(
"You sent me " + ((String) message)
+ " and the consol replied with "
+ reader.readLine(), getSelf());
} else
unhandled(message);
}
}
As you can see the blocking operation is readLine() - just an way of testing.
Is this how it should be done? If so, I had assumed that from the controller, I some how create an async result or something using promises. [ Handling asynchronous results ].
Couple issues, how do I send a message to the Actor and get the reply? I mean can I get the result from a tel() call?
How do I make sure that more threads don't get taken up and that all operations go into a queue - or is this already handled by the actor?
Could you please provide an example controller action that could do this?
Your help is greatly appreciated.
PS FYI I am really new to all this so just to get to this stage I have found these docs useful to read - the Akka actor pages, play of course and some wiki pages on actors.
[edit]
sorry I said single thread but it could be a thread pool - just as long as only the assigned thread / thread pool is used to handle the blocking io not any others.
You can send a message to the Akka actor using ask (instead of tell). It will return to you a Future, which then you can map to a Promise<Result>.
However, you don't really need to use Akka if you don't have to. You can simply use Futures/Promises to run your blocking operation in the background.
In either approach, you end up with a Future from which you can complete the request when the future finishes.
Example of Using Promise in Play 2.2.x
...
import play.libs.F.*;
public static Promise<Result> index() {
Promise<Integer> promiseOfInt = Promise.promise(
new Function0<Integer>() {
public Integer apply() {
// long-running operation (will run in separate thread)
return 42;
}
});
return promiseOfInt.map(
new Function<Integer, Result>() {
public Result apply(Integer i) {
// 'i' is the result after Promise is complete
return ok("Got result: " + i);
}
});
}
If you're using Akka, you need to convert the Future returned from ask to Play's Promise as follows:
public static Promise<Result> index() {
ActorRef myActor = Akka.system().actorFor("user/my-actor");
return Promise.wrap(ask(myActor, "hello", 1000)).map(
new Function<Object, Result>() {
public Result apply(Object response) {
return ok(response.toString());
}
});
}
Example of Using Promise in Play 2.1.x
...
import play.libs.F.*;
public static Result index() {
Promise<Integer> promiseOfInt = play.libs.Akka.future(
new Callable<Integer>() {
public Integer call() {
// long-running operation (will run in separate thread)
return 42;
}
});
return async(
promiseOfInt.map(
new Function<Integer,Result>() {
public Result apply(Integer i) {
// 'i' is the result after Promise is complete
return ok("Got result: " + i);
}
}));
}
If you're using Akka, you need to convert the Future returned from ask to Play's Promise as follows:
public static Result index() {
ActorRef myActor = Akka.system().actorFor("user/my-actor");
return async(
Akka.asPromise(ask(myActor,"hello", 1000)).map(
new Function<Object,Result>() {
public Result apply(Object response) {
return ok(response.toString());
}
}
)
);
}

Categories