Create never ending hot stream in spring flux - java

I have a consumer that serves data. Data is consumed and processed - not reactive. Then I took this data and send it to:
Sinks.many().multicast().onBackpressureBuffer(Queues.SMALL_BUFFER_SIZE, false);
I am using
sink.emitNext(
message, retryOn(Sinks.EmitFailureHandler.FAIL_FAST, message));
as suggested in other stack posts, where I am taking care of
Sinks.EmitResult.FAIL_NON_SERIALIZED
Still stuck on reactor.core.Exceptions$OverflowException: Backpressure overflow during Sinks.Many#emitNext
On the front end there is subscriber EventSource. My GET endpoint (simplified, tried many things here)
#GetMapping(path = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ServerSentEvent<Message>> streamData() {
return service
.getSink()
.asFlux()
.map(e -> ServerSentEvent.builder(e).event(e.getType().getId()).build());
}
I have tried suggestions in other posts. But I am obviously not capable to do: Never ending hot event stream, in which the subscriber once connected never misses any message.
Code is welcome, but opinions more matters.
Recap: Never ending hot event source, no message is lost to subscriber.
EDIT: Why the suggested posts don work for me: I am missing the part between the Consumer(of the queue) - Sink - Get endpoint: still end with reactor.core.Exceptions$OverflowException: Backpressure overflow during Sinks.Many#emitNext

A consumer receives data from queue. Data is processed non reactive. Result data sent to... Should be received in EventSource in front end.
Used sink to send the processed data:
sink = Sinks.many().multicast().directAllOrNothing();
sink.emitNext(
ServerSentEvent.builder(message).event(message.getType().getId()).build(),
retryOnNonSerialized(Sinks.EmitFailureHandler.FAIL_FAST,message)
);
Controller:
#PostConstruct
private void loadFlux() {
log.info("Constructing Flux....");
flux = service
.getSink()
.asFlux()
.publishOn(Schedulers.boundedElastic())
.onBackpressureBuffer()
.onBackpressureDrop(message -> blog.debug("[STREAM] Backpressure message drop: {}", message))
.share();
}
#GetMapping(path = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ServerSentEvent<Message>> streamData() {
return fluxData;
}
Works for me for never ending hot stream.

Related

Ack pubSub message outside of the MessageReciever

I am using async Pull to pull messages from a pupSub topic, do some processing and send messages to ActiveMQ topic.
With the current configuration of pupSub I have to ack() the messages upon recieval. This however, does not suit my use case, as I need to ONLY ack() messages after they are successfully processed and sent to the other Topic. this means (per my understanding) ack()ing the messages outside the messageReciver.
I tried to save the each message and its AckReplyConsumer to be able to call it later and ack() the messages, this however does not work as expected. and not all messages are correctly ack() ed.
So I want to know if this is possible at all. and if Yes how
my subscriber configs
public Subscriber getSubscriber(CompositeConfigurationElement compositeConfigurationElement, Queue<CustomPupSubMessage> messages) throws IOException {
ProjectSubscriptionName subscriptionName = ProjectSubscriptionName.of(compositeConfigurationElement.getPubsub().getProjectid(),
compositeConfigurationElement.getSubscriber().getSubscriptionId());
ExecutorProvider executorProvider =
InstantiatingExecutorProvider.newBuilder().setExecutorThreadCount(2).build();
// Instantiate an asynchronous message receiver.
MessageReceiver receiver =
(PubsubMessage message, AckReplyConsumer consumer) -> {
messages.add(CustomPupSubMessage.builder().message(message).consumer(consumer).build());
};
// The subscriber will pause the message stream and stop receiving more messages from the
// server if any one of the conditions is met.
FlowControlSettings flowControlSettings =
FlowControlSettings.newBuilder()
// 1,000 outstanding messages. Must be >0. It controls the maximum number of messages
// the subscriber receives before pausing the message stream.
.setMaxOutstandingElementCount(compositeConfigurationElement.getSubscriber().getOutstandingElementCount())
// 100 MiB. Must be >0. It controls the maximum size of messages the subscriber
// receives before pausing the message stream.
.setMaxOutstandingRequestBytes(100L * 1024L * 1024L)
.build();
//read credentials
InputStream input = new FileInputStream(compositeConfigurationElement.getPubsub().getSecret());
CredentialsProvider credentialsProvider = FixedCredentialsProvider.create(ServiceAccountCredentials.fromStream(input));
Subscriber subscriber = Subscriber.newBuilder(subscriptionName, receiver)
.setParallelPullCount(compositeConfigurationElement.getSubscriber().getSubscriptionParallelThreads())
.setFlowControlSettings(flowControlSettings)
.setCredentialsProvider(credentialsProvider)
.setExecutorProvider(executorProvider)
.build();
return subscriber;
}
my processing part
jmsConnection.start();
for (int i = 0; i < patchSize; i++) {
var message = messages.poll();
if (message != null) {
byte[] payload = message.getMessage().getData().toByteArray();
jmsMessage = jmsSession.createBytesMessage();
jmsMessage.writeBytes(payload);
jmsMessage.setJMSMessageID(message.getMessage().getMessageId());
producer.send(jmsMessage);
list.add(message.getConsumer());
} else break;
}
jmsSession.commit();
jmsSession.close();
jmsConnection.close();
// if upload is successful then ack the messages
log.info("sent " + list.size() + " in direction " + dest);
list.forEach(consumer -> consumer.ack());
There is nothing that requires messages to be acked within the MessageReceiver callback and you should be able to acknowledge messages asynchronously. There are a few things to keep in mind and look for:
Check to ensure that you are calling ack before the ack deadline expires. By default, the Java client library does extend the ack deadline for up to 1 hour, so if you are taking less time than that to process, you should be okay.
If your subscriber is often flow controlled, consider reducing the value you pass into setParallelPullCount to 1. The flow control settings you pass in are passed to each stream, not divided among them, so if each stream is able to receive the full value passed in and your processing is slow enough, you could be exceeding the 1-hour deadline in the client library without having even received the message yet, causing the duplicate delivery. You really only need to use setParallelPullCount to a larger value if you are able to process messages much faster than a single stream can deliver them.
Ensure that your client library version is at least 1.109.0. There were some improvements made to the way flow control was done in that version.
Note that Pub/Sub has at-least-once delivery semantics, meaning messages can be redelivered, even if ack is called properly. Note that not acknowledging or nacking a single message could result in the redelivery of all messages that were published together in a single batch. See the "Message Redelivery & Duplication Rate
" section of "Fine-tuning Pub/Sub performance with batch and flow control settings."
If all of that still doesn't fix the issue, then it would be best to try to create a small, self-contained example that reproduces the issue and open up a bug in the GitHub repo.

Message transfer in between two topics in google cloud pub sub

We have a use case where on any action from UI we need to read messages from google pub/sub Topic A synchronously and move those messages to Topic B.
Below is the code that has been written to handle this behavior and this is from Google Pub Sub docs to access a Topic synchronusly.
public static int subscribeSync(String projectId, String subscriptionId, Integer numOfMessages, int count, String acknowledgementTopic) throws IOException {
SubscriberStubSettings subscriberStubSettings =
SubscriberStubSettings.newBuilder()
.setTransportChannelProvider(
SubscriberStubSettings.defaultGrpcTransportProviderBuilder()
.setMaxInboundMessageSize(20 * 1024 * 1024) // 20MB (maximum message size).
.build())
.build();
try (SubscriberStub subscriber = GrpcSubscriberStub.create(subscriberStubSettings)) {
String subscriptionName = ProjectSubscriptionName.format(projectId, subscriptionId);
PullRequest pullRequest =
PullRequest.newBuilder()
.setMaxMessages(numOfMessages)
.setSubscription(subscriptionName)
.build();
// Use pullCallable().futureCall to asynchronously perform this operation.
PullResponse pullResponse = subscriber.pullCallable().call(pullRequest);
List<String> ackIds = new ArrayList<>();
for (ReceivedMessage message : pullResponse.getReceivedMessagesList()) {
// START - CODE TO PUBLISH MESSAGE TO TOPIC B
**publishMessage(message.getMessage(),acknowledgementTopic,projectId);**
// END - CODE TO PUBLISH MESSAGE TO TOPIC B
ackIds.add(message.getAckId());
}
// Acknowledge received messages.
AcknowledgeRequest acknowledgeRequest =
AcknowledgeRequest.newBuilder()
.setSubscription(subscriptionName)
.addAllAckIds(ackIds)
.build();
// Use acknowledgeCallable().futureCall to asynchronously perform this operation.
subscriber.acknowledgeCallable().call(acknowledgeRequest);
count=pullResponse.getReceivedMessagesList().size();
}catch(Exception e) {
log.error(e.getMessage());
}
return count;
}
Below is the sample code to publish messages to Topic B
public static void publishMessage(PubsubMessage pubsubMessage,String Topic,String projectId) {
Publisher publisher = null;
ProjectTopicName topicName =ProjectTopicName.newBuilder().setProject(projectId).setTopic(Topic).build();
try {
// Publish the messages to normal topic.
publisher = Publisher.newBuilder(topicName).build();
} catch (IOException e) {
log.error(e.getMessage());
}
publisher.publish(pubsubMessage);
}
Is this the right way of handling this use case or this can be handled in someother way. We do not want to use Cloud Dataflow. Can someone let us know if this is fine or there is an issue.
The code works but sometimes messages stay on Topic A even after hey are consumed synchronously.
Thanks'
There are some issues with the code as presented.
You should really only use synchronous pull if there are specific reasons why you need to do so. In general, it is much better to use asynchronous pull via the client libraries. It will be more efficient and reduce the latency of moving messages from one topic to the other. You do not show how you call subscribeSync, but in order to process messages efficiently and ensure that you actually process all messages, you'd need to be calling it many times in parallel continuously. If you are going to stick with synchronous pull, then you should reuse the SubscriberStub object as recreating it for every call will be inefficient.
You don't reuse your Publisher object. As a result, you are not able to take advantage of the batching that the publisher client can do. You should create the Publisher once and reuse it across your calls for publishes to the same topic. If the passed-in topic can differ across messages, then keep a map from topic to publisher and retrieve the right one from the map.
You don't wait for the result of the call to publish. It is possible that this call fails, but you do not handle that failure. As a result, you could acknowledge the message on the first topic without it having actually been published, resulting in message loss.
With regard to your question about duplicates, Pub/Sub offers at-least-once delivery guarantees, so even with proper acking, it is still possible to receive messages again (typical duplicate rates are around 0.1%). There can be many different reasons for duplicates. In your case, since you are processing messages sequentially and recreating a publisher for every call, it could be that later messages are not acked before the ack deadline expires, which results in redelivery.

Spring Webflux endpoint working as a topic

I have an Flux endpoint that I provide to clients (subscribers) to receive updated prices. I'm testing it accessing the URL (http://localhost:8080/prices) though the browser and it works fine. The problem I'm facing (I'm maybe missing some concepts here) is when I open this URL in many browsers and I expect to receive the notification in all of them, but just one receives. It is working as a queue instead of a topic (like in message Brokers). Is that correct behavior?
#GetMapping(value = "prices", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ServerSentEvent<Collection<Price>>> prices() {
return Flux.interval(Duration.ofSeconds(5))
.map(sec -> pricesQueue.get())
.filter(prices -> !prices.isEmpty())
.map(prices -> ServerSentEvent.<Collection<Price>> builder()
.event("status-changed")
.data(prices)
.build());
}
get isn't a standard queue operation, but this is almost certainly because your pricesQueue.get() method isn't idempotent. With every request (with every browser window you open in this case), you'll get a new flux that calls pricesQueue.get() every 5 seconds. Now if pricesQueue.get() just retrieves the latest item in the queue and does nothing with it, all is good - all your subscribers receive the same item, and the same item is displayed. But if it acts more like a poll() where it removes the item in the queue after it's retrieved it, then only the first flux will get that value - the rest won't, as by that point it will have been removed.
You've really two main options here:
Change your get() implementation (or implement a new method) so that it doesn't mutate the queue, only retrieves a value.
Turn the flux into a hot flux. Store Flux.interval(Duration.ofSeconds(5)).map(sec -> pricesQueue.get()).publish().autoConnect() somewhere as a field (let's say as queueFlux), then just return queueFlux.filter(prices -> !prices.isEmpty()).map(...) in your controller method.

I have a complicated set of tasks using Java Web-Client requests that need to run in parallel and finally block to return a single response

I am new to the Web Client reactive library.
Here is my problem :
It starts with a user submitting a request to post a packet of documents. They wait for a response.
A service consuming this request needs to run several tasks in parallel. Some of the sub-tasks within each task have to finish first ( 2 different Get requests ) before attempting the last sub-task which is the main Post request. Then I want to wait for the collection of all tasks 'Post sub-tasks' to finish (representing the packet), and then collect and reconcile the responses.
I need to reconcile at the end and make sure the entire parallel process succeeds (sending the packet) and then respond to a server (user) indicating that the process was successful or not.
My pseudo flow:
Create a set of documents to post to a server one at a time. A packet can contain up to 10 documents. (List of DocumentAndMetaData). Initially each document would contain some pre-filled known values like file path and document name.
For each document in a packet: (run in parallel)
I need to do file I/O and create a meta data object- call it getDocumentAndMetadata. To create a Metadata object I must do some
steps first within getDocumentAndMetadata:
Do a get Request to get Key A- call it getKeyA(requestA)
Do a get request to get Key B- call it getKeyB(requestB)
Merge Key A and Key B requests and use the responses from those requests to update the metadata object.
Then Read File to get a Byte array - call it getFile
Then pass the byte array (document) and metadata object to a function that:
Does a Http Post to a server sending the byte array and metadata object in the post request.
Accumulate the responses for each Post which are strings.
then block until all the documents are sent.
Finally evaluate all the string responses that are returned from the Post requests and make sure the number of responses match the number of documents posted to a server. Track any errors. If any Get or Post request fails, log the error.
I figured out how to do all these steps running block() on each sub-task 'Get request' and then block() on the main 'Post request', but I am afraid the performance will suffer using this approach.
I need help with how to generate the flow using Web-Client and reactive non blocking parallel processes.
Thanks for any help.
' I am afraid the performance will suffer using this approach.' - You are right. After all, the whole purpose of using WebFlux is to create a non-blocking application.
I have tried to mock most of the logic. I hope you can correlate the solution with your use-case.
#RestController
public class MyController {
#Autowired
private WebClient webClient;
#PostMapping(value = "/postPacketOfDocs")
public Mono<ResponseEntity<String>> upload(#RequestBody Flux<String> documentAndMetaDataList) {
return documentAndMetaDataList
.flatMap(documentAndMetaData -> {
//do File I/O
return getDocumentAndMetadata(documentAndMetaData);
})
.map(String::getBytes) //Read File to get a Byte array
.flatMap(fileBytes -> {
return webClient.post().uri("/send/byte/and/metadata")
.retrieve().bodyToMono(String.class);
})
.collectList()
.flatMap(allResponsesFromEachPOST -> {
//Do some validation
boolean allValidationsSuccessful = true;
if (allValidationsSuccessful) {
return Mono.just("Success");
} else {
return Mono.error(new RuntimeException()); //some custom exception which can be handled by #ExceptionHandler
}
})
.flatMap(msg -> Mono.just(ResponseEntity.ok().body(msg)));
}
private Mono<String> getDocumentAndMetadata(String documentAndMetaData) {
String metadata = "";//get metadata object from documentAndMetaData
Mono<String> keyAResponse = webClient.get().uri("/get/keyA").retrieve().bodyToMono(String.class);
Mono<String> keyBResponse = webClient.get().uri("/get/keyB").retrieve().bodyToMono(String.class);
return keyAResponse.concatWith(keyBResponse)
.collectList()
.flatMap(responses -> updateMetadata(responses, metadata));
}
private Mono<String> updateMetadata(List<String> responses, String metadata) {
String newMedataData = metadata + responses.get(0) + responses.get(1); //some update logic
return Mono.just(newMedataData);
}
}

Invoking non-blocking operations sequentially while consuming from a Flux including retries

So my use-case is to consume messages from Kafka in a Spring Webflux application while programming in the reactive style using Project Reactor, and to perform a non-blocking operation for each message in the same order as the messages were received from Kafka. The system should also be able to recover on its own.
Here is the code snippet that is setup to consume from :
Flux<ReceiverRecord<Integer, DataDocument>> messages = Flux.defer(() -> {
KafkaReceiver<Integer, DataDocument> receiver = KafkaReceiver.create(options);
return receiver.receive();
});
messages.map(this::transformToOutputFormat)
.map(this::performAction)
.flatMapSequential(receiverRecordMono -> receiverRecordMono)
.doOnNext(record -> record.receiverOffset().acknowledge())
.doOnError(error -> logger.error("Error receiving record", error))
.retryBackoff(100, Duration.ofSeconds(5), Duration.ofMinutes(5))
.subscribe();
As you can see, what I do is: take the message from Kafka, transform it into an object intended for a new destination, then send it to the destination, and then acknowledge the offset to mark the message as consumed and processed. It is critical to acknowledge the offset in the same order as the messages being consumed from Kafka so that we don't move the offset beyond messages that were not fully processed (including sending some data to the destination). Hence I'm using a flatMapSequential to ensure this.
For simplicity let's assume the transformToOutputFormat() method is an identity transform.
public ReceiverRecord<Integer, DataDocument> transformToOutputFormat(ReceiverRecord<Integer, DataDocument> record) {
return record;
}
The performAction() method needs to do something over the network, say call an HTTP REST API. So the appropriate APIs return a Mono, which means the chain needs to be subscribed to. Also, I need the ReceiverRecord to be returned by this method so that the offset can be acknowledged in the flatMapSequential() operator above. Because I need the Mono subscribed to, I'm using flatMapSequential above. If not, I could have used a map instead.
public Mono<ReceiverRecord<Integer, DataDocument>> performAction(ReceiverRecord<Integer, DataDocument> record) {
return Mono.just(record)
.flatMap(receiverRecord ->
HttpClient.create()
.port(3000)
.get()
.uri("/makeCall?data=" + receiverRecord.value().getData())
.responseContent()
.aggregate()
.asString()
)
.retryBackoff(100, Duration.ofSeconds(5), Duration.ofMinutes(5))
.then(Mono.just(record));
I have two conflicting needs in this method:
1. Subscribe to the chain that makes the HTTP call
2. Return the ReceiverRecord
Using a flatMap() means my return type changes to a Mono. Using doOnNext() in the same place would retain the ReceiverRecord in the chain, but would not allow the HttpClient response to be subscribed to automatically.
I can't add .subscribe() after asString(), because I want to wait till the HTTP response is completely received before the offset is acknowledged.
I can't use .block() either since it runs on a parallel thread.
As a result, I need to cheat and return the record object from the method scope.
The other thing is that on a retry inside performAction it switches threads. Since flatMapSequential() eagerly subscribes to each Mono in the outer flux, this means that while acknowledgement of offsets can be guaranteed in order, we can't guarantee that the HTTP call in performAction will be performed in the same order.
So I have two questions.
Is it possible to return record in a natural way rather than returning the method scope object?
Is it possible to ensure that both the HTTP call as well as the offset acknowledgement are performed in the same order as the messages for which these operations are occurring?
Here is the solution I have come up with.
Flux<ReceiverRecord<Integer, DataDocument>> messages = Flux.defer(() -> {
KafkaReceiver<Integer, DataDocument> receiver = KafkaReceiver.create(options);
return receiver.receive();
});
messages.map(this::transformToOutputFormat)
.delayUntil(this::performAction)
.doOnNext(record -> record.receiverOffset().acknowledge())
.doOnError(error -> logger.error("Error receiving record", error))
.retryBackoff(100, Duration.ofSeconds(5), Duration.ofMinutes(5))
.subscribe();
Instead of using flatMapSequential to subscribe to the performAction Mono and preserve sequence, what I've done instead is delayed the request for more messages from the Kafka receiver until the action is performed. This enables the one-at-a-time processing that I need.
As a result, performAction doesn't need to return a Mono of ReceiverRecord. I also simplified it to the following:
public Mono<String> performAction(ReceiverRecord<Integer, DataDocument> record) {
HttpClient.create()
.port(3000)
.get()
.uri("/makeCall?data=" + receiverRecord.value().getData())
.responseContent()
.aggregate()
.asString()
.retryBackoff(100, Duration.ofSeconds(5), Duration.ofMinutes(5));
}

Categories