Repeating headers in Non-Blocking Retry Mechanism - Spring Kafka

Repeating headers in Non-Blocking Retry Mechanism - Spring Kafka - java

I'm using spring-kafka non-blocking retry mechanism. I've noticed that the following headers accumulate on next retry topics:
retry_topic-attempts
retry_topic-backoff-timestamp
retry_topic-original-timestamp
It means that on retry_0 there are 3 headers, on retry_1 they are doubled and finally on dlt they are tripled.
As far as I unterstand they are essential to conduct retry process, but I don't think that DLT needs information regarding retry_0.
Is it possible not to do it, because the messages become heavier and heavier with each retry?

Kafka headers allow multiple values; starting with version 2.9.5, you can configure the framework to replace headers instead of adding values to them.
https://github.com/spring-projects/spring-kafka/pull/2529
https://docs.spring.io/spring-kafka/docs/2.9.5/reference/html/#retry-headers
Please open an issue on GitHub, we should add an option similar to stripOriginalExceptionHeaders (which is true by default) for these other headers.

Related

Spring Integration 5.0 Reactor types support

From release notes (https://spring.io/blog/2017/11/29/spring-integration-5-0-ga-available):
Reactive Streams support via FluxMessageChannel,
ReactiveStreamsConsumer and direct org.reactivestreams.Subscriber
implementation in the AbstractMessageHandler;
My understanding for Reactor support was e.g. you can return Mono/Flux from a transformer/handler, and Spring Integration will automatically transform it to Messages while respecting back pressure. Unfortunately, I cannot make it work like that, e.g.:
IntegrationFlows.from("input")
.handle((p, h) -> Flux.just(1, 2, 3))
.log("l1")
.channel("output")
.get();
still logs one Message with FluxArray typed payload instead of three Messages with Integer payloads.
2017-12-18 17:12:33.262 INFO 97471 --- [nio-8080-exec-1] l1 : GenericMessage [payload=FluxArray, headers={id=a9701681-9945-f953-8b72-df369c2982a3, timestamp=1513613553262}]
Also, there is nothing in docs according this behaviour and new
FluxMessageChannel,
ReactiveStreamsConsumer and direct org.reactivestreams.Subscriber
implementation in the AbstractMessageHandler
So my question is, do I understand implemented Reactor support correctly, and where can I find any info on that topic?

Since we here in messaging and that really doesn't matter for the message what kind of payload you return from your service, everything is just wrapped to the Message as is. You need a special component to understand this payload. One of them is Splitter. This one determines that your payload is a Reactive Streams Publisher and iterated over that as a Flux.
Another component is WebFluxInboundEndpoint which supports this kind of payloads natively.
Your custom Service Activator might expect Flux as an argument to deal with.
But nothing happens automatically. Spring Integration supports Reactive types, but doesn't do they processing without end-user preferences.
BTW, the splitter should be supplied with the FluxMessageChannel as an output to process the splitted Flux via back--pressure manner.
Feel free to raise A JIRA about documenting FluxMessageChannel. Indeed we have missed that. The ReactiveStreamsConsumer needs more love as well and we have some plans for 5.1 to improve Reactive Streams model and we'll try to make it more flexible or even like an option to turn on it by default. Nothing can promise from today though.

What is Camel KafkaIdempotentRepository and the uses of it?

I'm trying to get the msg with the latest offset in kafka. Can this be used to get that? 'KafkaIdempotentRepository'
If not what's the use of it?
In the java doc it says the following. But it's not clear what's the real use of it.

Camel Idempotent Repository implementations are used as consumer to filter out duplicate messages. And KafkaIdempotentRepository is one of the many implementations Camel provide (e.g. others are MemoryIdempotentRepository, FileIdempotentRepository, HazelcastIdempotentRepository, JCacheIdempotentRepository, InfinispanIdempotentRepository, etc...).
For more detailed reading please refer to below links:
https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Fuse/6.2/html/Apache_Camel_Development_Guide/MsgEnd-Idempotent.html
http://people.apache.org/~dkulp/camel/idempotent-consumer.html
Coming back to your questions:
I'm trying to get the msg with the latest offset in kafka. Can this be used to get that? 'KafkaIdempotentRepository' If not what's the use of it?
In my personal opinion, I don't think KafkaIdempotentRepository is meant to serve this use case.
Kafka does guarantee ordering which means message served will have the latest committed offset within a partition.

Building proxies hub

First of all I'm gathering information about this question and so that i could implement this feature in a more elegant way.
Let's look at the picture below
The target server (green circle)
This is an api server that I use to fetch some data.
Features:
Only https connection
Response in json format.
Can accept get requests like these [ https://api.server.com/user=1&option&api_key=? ]
Proxy controller (blue square)
It's a simple server that stores list of proxies; Send and receive some data; And I want to talk about the software that i will to run on top of it.
Features:
Proxy list
Api keys list
I think it should be a hashmap that stores ip=>token list or database table if I want to scale my application.
Workers
Just analyze a json response and pass data to the db.
Let's go closer to the proxy controller server.
The first idea:
Create newFixedThreadPoolExecutor
Pass url/token to worker: server.submit(new Worker(url, token, proxy))
Worker analyze the data and pass it to db.
But in my opinion this solution is quite big and hard to maintain, I want to engage endpoint that gather stats, kill or spawn new workers and so on.
The second idea:
Worker generates an request like https://host/user=1&option=1
Pass it to the Proxy controller
Proxy controller assign to the request the api key and proxy server
Execute the request
Accept the response
Pass it back to a worker (I think that the best idea is to put a load balancer between workers and proxy controller).
This solution seems to me quite hacky. For example if the worker is dead the proxy server sends bunch of requests to the dead worker and it could led to dataloss.
The third idea:
The same as the second but instead of sending data directly to the worker the proxy controller pass it to some bus. I find some information about apache camel that allow me to organize this solution. In this case the dead worker is dead worker and dataloss equals zero (maybe).
Of course all three cases don't handle an errors. Some errors can be solved by resending the request with additional data. Some errors can be solved by re-span the workers.
So in your opinion what is the best solution in this case? Do I miss some hidden problems that can appear later? Which tools I should use?
Thanks

What are trying to reach?
Maybe you consider using this architecture:
NGINX (proxy + load balance) -> WORKER SERVERS -> DB SERVER (maybe use some NoSQL like Cassandra)

Best way handle partitioned data with AMQP?

I have several similar systems which are authoritative for different parts of my data, but there's no way I can tell just from my "keys" which system owns which entities.
I'm working to build this system on top of AMQP (RabbitMQ), and it seems like the best way to handle this would be:
Create a Fanout exchange, called thingInfo, and have all of my other systems bind their own anonymous queues to that exchange.
Send a message out to the exchange: {"thingId": "123abc"}, and set a reply_to queue.
Wait for a single one of the remote hosts to reply to my message, or for some timeout to occur.
Is this the best way to go about solving this sort of problem? Or is there a better way to structure what I'm looking for? This feels mostly like the RPC example from the RabbitMQ docs, except I feel like using a broadcast exchange complicates things.
I think I'm basically trying to emulate the model described for MCollective's Message Flow, but, while I think MCollective generally expects more than one response, in this case, I would expect/require precisely one or, preferably, a clear "nope, don't have it, go fish" response from "everyone" (if it's really possible to even know that in this sort of architecture?).
Perhaps another model that mostly fits is "Scatter-Gather"? It seems there's support for this in Spring Integration.

It's a reasonable architecture (have the uninterested consumers simply ignore the message).
If there's some way to extract the pertinent data that the consumers use to decide interest into headers, then you can gain some efficiency by using a topic exchange instead of a fanout.
In either case, it gets tricky if more than one consumer might reply.
As you say, you can use a timeout if zero consumers reply, but if you think that might be frequent, you may be better off using arbitrary two-way messaging and doing the reply correlation in your code rather than using request/reply and tying up a thread waiting for a reply that will never come, and timing out.
This could also deal with the multi-reply case.

Cometd filter data for specific client in channel

I'm designing a system using comet where there is a common channel where data getting published. I need to filter the data using some conditions based on client subscription details. Can anyone tell how I can do this? I thought I can do this using DataFilter.
Channel.addDataFilter(DataFilter filter);
Is this the correct way? If so any sample code to achieve this please?

There is no Channel.addDataFilter(DataFilter) method, but you can achieve the same results in a different way.
First, have a look at the available DataFilter implementations already available.
Then it's enough that you add a DataFilterMessageListener to the channel you want to filter data on, and specify one or more DataFilter to the DataFilterMessageListener.
You can find an example of this in the CometD demos shipped with the CometD distribution, for example here.
The right way to add the DataFilterMessageListener is during channel initialization, as it is done in the example linked above through a #Configure annotation, or equivalently via ServerChannel.Initializer.
Finally, have a look at how messages are processed on the server from the documentation: http://docs.cometd.org/reference/#concepts_message_processing.
It is important to understand that modifications made by DataFilter are seen by all subscribers.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.