Count number of successful processed messages on camel parallel split - java

I'm diggin' into camel, in order to process a lot of records in parallel. I have something like this:
from(CAMEL_START_ROUTE_CTE)
.multicast().parallelProcessing()
.to(CAMEL_PROCESS_DOMAINS_ROUTE)
.to(CAMEL_PROCESS_OTHERS_ROUTE)
.end()
.onCompletion()
.to(EndCamelRouteBuilder.CAMEL_ROUTE);
Where CAMEL_START_ROUTE_CTE is
timer:foo?delay=100&repeatCount=1
And then, CAMEL_PROCESS_DOMAINS_ROUTE looks like this:
from(CAMEL_PROCESS_DOMAINS_ROUTE)
.setHeader("domains").constant(config.getDomains())
.split(header("domains"))
.parallelProcessing()
.to(ProcessDomainCamelRoute.CAMEL_ROUTE)
.end()
.end();
Simplifying, domains is a json list.
What I'm trying to achieve is a way to count the number of messages and be able to read it in the route EndCamelRoute
I've tried with exchange.setProperty into the processor (using CamelSplitSize), setHeader, etc... but I always get a null when reading.
Does anybody know a way to achieve something like this? Some kind... of a reporting stuff (number of failed, successful messages), but consumed in a different route

I was fighting a similar issue while using split. Ended up having a very simple bean holding a hashmap where I store my counters.
I was looking into MicroMeter Component but it is consumer only.

Related

Camel maximumRedeliveries is ignored

I try to use Camel to deliver files from one folder to a rest call and Im trying to achieve that on Error it's tried to redeliver twice and then moved to an error folder if the second redelivery fails as well. My code in the RouteBuilder's configure method looks like this:
errorHandler(deadLetterChannel("file:///home/camelerror").useOriginalMessage());
from("file:///home/camelefiles")
.onException(RetryableException.class)
.log("RetryableException handled")
.maximumRedeliveries(2)
.end()
.routeId(port.id())
.throwException(new RetryableException());
I get the "RetryableException handled" logs so I guess the exception is handled correctly but it redelivers the message an infinite number of times.
What am I doing wrong and how can I achieve that the message is only redelivered twice and then the deadLetterChannel is used?

Batch consumer camel kafka

I am unable to read in batch with the kafka camel consumer, despite following an example posted here. Are there changes I need to make to my producer, or is the problem most likely with my consumer configuration?
The application in question utilizes the kafka camel component to ingest messages from a rest endpoint, validate them, and place them on a topic. I then have a separate service that consumes them from the topic and persists them in a time-series database.
The messages were being produced and consumed one at a time, but the database expects the messages to be consumed and committed in batch for optimal performance. Without touching the producer, I tried adjusting the consumer to match the example in the answer to this question:
How to transactionally poll Kafka from Camel?
I wasn't sure how the messages would appear, so for now I'm just logging them:
from(kafkaReadingConsumerEndpoint).routeId("rawReadingsConsumer").process(exchange -> {
// simple approach to generating errors
String body = exchange.getIn().getBody(String.class);
if (body.startsWith("error")) {
throw new RuntimeException("can't handle the message");
}
log.info("BODY:{}", body);
}).process(kafkaOffsetManager);
But the messages still appear to be coming across one at a time with no batch read.
My consumer config is this:
kafka:
host: myhost
port: myport
consumer:
seekTo: beginning
maxPartitionFetchBytes: 55000
maxPollRecords: 50
consumerCount: 1
autoOffsetReset: earliest
autoCommitEnable: false
allowManualCommit: true
breakOnFirstError: true
Does my config need work, or are there changes I need to make to the producer to have this work correctly?
At the lowest layer, the KafkaConsumer#poll method is going to return an Iterator<ConsumerRecord>; there's no way around that.
I don't have in-depth experience with Camel, but in order to get a "batch" of records, you'll need some intermediate collection to "queue" the data that you want to eventually send downstream to some "collection consumer" process. Then you will need some "switch" processor that says "wait, process this batch" or "continue filling this batch".
As far as databases go, that process is exactly what Kafka Connect JDBC Sink does with batch.size config.
We solved a similar requirement by using the Aggregation [1] capability provided by Camel
A rough code snippet
#Override
public void configure() throws Exception {
// 1. Define your Aggregation Strat
AggregationStrategy agg = AggregationStrategies.flexible(String.class)
.accumulateInCollection(ArrayList.class)
.pick(body());
from("kafka:your-topic?and-other-params")
// 2. Define your Aggregation Strat Params
.aggregate(constant(true), agg)
.completionInterval(1000)
.completionSize(100)
.parallelProcessing(true)
// 3. Generate bulk insert statement
.process(exchange -> {
List<String> body = (List<String>) exchange.getIn().getBody();
String query = generateBulkInsertQueryStatement("target-table", body);
exchange.getMessage().setBody(query);
})
.to("jdbc:dataSource");
}
There are a variety of strategies that you can implement, but we chose this particular one because it allows you to create a List of strings for the message contents that we need to ingest into the db. [2]
We set a variety of different params such as completionInterval & completionSize. The most important one for us was to set parallellProcessing(true) [3] ; without that our performance wasn't nearly getting the required throughput.
Once the aggregation has either collected 100 messages or 1000 ms has passed, then the processor generates a bulk insert statement, which then gets sent to the db.
[1] https://camel.apache.org/components/3.18.x/eips/aggregate-eip.html
[2] https://camel.apache.org/components/3.18.x/eips/aggregate-eip.html#_aggregating_into_a_list
[3] https://camel.apache.org/components/3.18.x/eips/aggregate-eip.html#_worker_pools

Zipkin, using existing libraries to handle tracing in microservices connected with Apache Kafka

I would like to implement tracing in my microservices architecture. I am using Apache Kafka as message broker and I am not using Spring Framework. Tracing is a new concept for me. At first I wanted to create my own implementation, but now I would like to use existing libraries. Brave looks like the one I will want to use. I would like to know if there are some guides, examples or docs on how to do this. Documentation on Github page is minimal, and I find it hard to start using Brave. Or maybe there is better library with proper documentation, that is easier to use. I will be looking at Apache HTrace because it looks promising. Some getting started guides will be nice.
There are a bunch of ways to answer this, but I'll answer it from the "one-way" perspective. The short answer though, is I think you have to roll your own right now!
While Kafka can be used in many ways, it can be used as a transport for unidirectional single producer single consumer messages. This action is similar to normal one-way RPC, where you have a request, but no response.
In Zipkin, an RPC span is usually request-response. For example, you see timing of the client sending to the server, and also the way back to the client. One-way is where you leave out the other side. The span starts with a "cs" (client send) and ends with a "sr" (server received).
Mapping this to Kafka, you would mark client sent when you produce the message and server received when the consumer receives it.
The trick to Kafka is that there is no nice place to stuff the trace context. That's because unlike a lot of messaging systems, there are no headers in a Kafka message. Without a trace context, you don't know which trace (or span for that matter) you are completing!
The "hack" approach is to stuff trace identifiers as the message key. A less hacky way would be to coordinate a body wrapper which you can nest the trace context into.
Here's an example of the former:
https://gist.github.com/adriancole/76d94054b77e3be338bd75424ca8ba30
I meet the same problem too.Here is my solution, a less hacky way as above said.
ServerSpan serverSpan = brave.serverSpanThreadBinder().getCurrentServerSpan();
TraceHeader traceHeader = convert(serverSpan);
//in kafka producer,user KafkaTemplete to send
String wrapMsg = "wrap traceHeader with originMsg ";
kafkaTemplate.send(topic, wrapMsg).get(10, TimeUnit.SECONDS);// use synchronization
//then in kafka consumer
ConsumerRecords<String, String> records = consumer.poll(5000);
// for loop
for (ConsumerRecord<String, String> record : records) {
String topic = record.topic();
int partition = record.partition();
long offset = record.offset();
String val = record.value();
//parse val to json
Object warpObj = JSON.parseObject(val);
TraceHeader traceHeader = warpObj.getTraceHeader();
//then you can do something like this
MyRequest myRequest = new MyRequest(traceHeader, "/esb/consumer", "POST");
brave.serverRequestInterceptor().handle(new HttpServerRequestAdapter(new MyHttpServerRequest(myRequest), new DefaultSpanNameProvider()));
//then some httprequest within brave-apache-http-interceptors
//http.post(url,content)
}
you must implements MyHttpServerRequest and MyRequest.It is easy,you just return something a span need,such as uri,header,method.
This is a rough and ugly code example,just offer an idea.

continue behavior in camel route execution

I want to put continue behaviour in route, my route is like following
from("file:D:\\?fileName=abc.csv&noop=true").split().unmarshal().csv()
.to("direct:insertToDb").end();
from("direct:insertToDb")
.to("direct:getDataId")
.to("direct:getDataParameters")
.to("direct:insertDataInDb");
from("direct:getDataId")
.to("sql:SELECT id FROM data WHERE name = :#name)
.choice()
.when(header("id").isGreaterThan(0) )
.setProperty("id", header("id"))
.otherwise()
.log("Error for")
.endChoice().end();
I want that if direct:getDataId dont find any record , my execution of route for current record from CSV get skip and program process next request. it would be equal to continue keyword.
How i can achieve this in Apache Camel route?
You can modify your routes like this:
from("file:D:\\?fileName=abc.csv&noop=true").split().unmarshal().csv()
.to("sql:SELECT id FROM data WHERE name = :#name?outputHeader=id&outputType=SelectOne)
.choice().when(header("id").isGreaterThan(0))
.to("direct:getDataParameters")
.to("direct:insertDataInDb")
.end();
Have you got a test for this? I suggest you try using CamelTestSupport because what you want is how camel will execute by default.
From Camel Split Docs:
stopOnException
default:false
description: Whether or not to stop continue processing immediately when an exception occurred. If disable, then Camel continue splitting and process the sub-messages regardless if one of them failed. You can deal with exceptions in the AggregationStrategy class where you have full control how to handle that.

Apache camel multicast FreeMarker

I need to send two different XMLs (by FreeMarker) to two different endpoints.
i.e.
.to("freemarker:templates/xml1.ftl").to("file://C:\\testXmls1")
and
.to("freemarker:templates/xml2.ftl").to("file://C:\\testXmls2")
I had a look at the multicast() function but I don't know how to apply it when there are two .to
Could anyone please help me?
Yes you can specify multiple endpoints in the same .to(uri1, uri2, ...) then it becomes as a single "eip".
multicast()
.to(uri1a, uri1b)
.to(uri2a, uri2b)
.end() // to end multicast
Otherwise you would have to enclose it using the pipeline eip.
multicast()
.pipeline().to(uri1a).to(uri1b).end() // to end this pipeline
.pipeline().to(uri2a).to(uri2b).end() // to end this pipeline
.end() // to end multicast

Categories