I have a camel application where, I am reading files from FTP source.
Then the file goes through multiple routes, like one route goes to cassandra for storage, one route process the data and push pivoted data to Kafka topic etc
I want to mark the file processed when it goes through all routes and reaches till the end. This way I can build a processing completed log based on file name.
One way, That I can think of is to implement aggregator, where each route will send completion notification in exchange header and then based on the completion criteria logic in aggregator, I will mark that file as processed.
How will I write such aggregator in java?
You could try using multicast.
from("direct:start")
.multicast()
.to("direct:a","direct:b")
.end()
// Won't run until the sub routes are complete
.process(new MarkFileAsCompletedProcessor())
.log("Finished multicast");
from("direct:a")
.log("Processing a")
.to("mock:endOfA");
from("direct:b")
.log("Processing b")
.to("mock:endOfB");
Related
I am unable to read in batch with the kafka camel consumer, despite following an example posted here. Are there changes I need to make to my producer, or is the problem most likely with my consumer configuration?
The application in question utilizes the kafka camel component to ingest messages from a rest endpoint, validate them, and place them on a topic. I then have a separate service that consumes them from the topic and persists them in a time-series database.
The messages were being produced and consumed one at a time, but the database expects the messages to be consumed and committed in batch for optimal performance. Without touching the producer, I tried adjusting the consumer to match the example in the answer to this question:
How to transactionally poll Kafka from Camel?
I wasn't sure how the messages would appear, so for now I'm just logging them:
from(kafkaReadingConsumerEndpoint).routeId("rawReadingsConsumer").process(exchange -> {
// simple approach to generating errors
String body = exchange.getIn().getBody(String.class);
if (body.startsWith("error")) {
throw new RuntimeException("can't handle the message");
}
log.info("BODY:{}", body);
}).process(kafkaOffsetManager);
But the messages still appear to be coming across one at a time with no batch read.
My consumer config is this:
kafka:
host: myhost
port: myport
consumer:
seekTo: beginning
maxPartitionFetchBytes: 55000
maxPollRecords: 50
consumerCount: 1
autoOffsetReset: earliest
autoCommitEnable: false
allowManualCommit: true
breakOnFirstError: true
Does my config need work, or are there changes I need to make to the producer to have this work correctly?
At the lowest layer, the KafkaConsumer#poll method is going to return an Iterator<ConsumerRecord>; there's no way around that.
I don't have in-depth experience with Camel, but in order to get a "batch" of records, you'll need some intermediate collection to "queue" the data that you want to eventually send downstream to some "collection consumer" process. Then you will need some "switch" processor that says "wait, process this batch" or "continue filling this batch".
As far as databases go, that process is exactly what Kafka Connect JDBC Sink does with batch.size config.
We solved a similar requirement by using the Aggregation [1] capability provided by Camel
A rough code snippet
#Override
public void configure() throws Exception {
// 1. Define your Aggregation Strat
AggregationStrategy agg = AggregationStrategies.flexible(String.class)
.accumulateInCollection(ArrayList.class)
.pick(body());
from("kafka:your-topic?and-other-params")
// 2. Define your Aggregation Strat Params
.aggregate(constant(true), agg)
.completionInterval(1000)
.completionSize(100)
.parallelProcessing(true)
// 3. Generate bulk insert statement
.process(exchange -> {
List<String> body = (List<String>) exchange.getIn().getBody();
String query = generateBulkInsertQueryStatement("target-table", body);
exchange.getMessage().setBody(query);
})
.to("jdbc:dataSource");
}
There are a variety of strategies that you can implement, but we chose this particular one because it allows you to create a List of strings for the message contents that we need to ingest into the db. [2]
We set a variety of different params such as completionInterval & completionSize. The most important one for us was to set parallellProcessing(true) [3] ; without that our performance wasn't nearly getting the required throughput.
Once the aggregation has either collected 100 messages or 1000 ms has passed, then the processor generates a bulk insert statement, which then gets sent to the db.
[1] https://camel.apache.org/components/3.18.x/eips/aggregate-eip.html
[2] https://camel.apache.org/components/3.18.x/eips/aggregate-eip.html#_aggregating_into_a_list
[3] https://camel.apache.org/components/3.18.x/eips/aggregate-eip.html#_worker_pools
I have a pretty easy route where I pickup files from a directory and send it to a bean:
from("file:/mydir?delete=true").bean(MyProcessor.class);
It can happen that an exception occures in MyProcessor.class and so I want to delay the processing of that file again. How can I setup a redelivery for that as I tried already different things with
onException().redeliveryDelay(10000);
but it didn't work and right after the exception the same file gets processed again.
Did you do onException() before Processing?
Example:
errorHandler(defaultErrorHandler()
.maximumRedeliveries(2)
.redeliveryDelay(5000)
.retryAttemptedLogLevel(LoggingLevel.WARN));
// exception handler for specific exceptions
onException(IOException.class).maximumRedeliveries(1).redeliveryDelay(5000);
//only the failed record send write to error folder
onException(CsvRecordException.class)
.to("file:/app/dev/dataland/error");
onCompletion()
.log("global thread: ${threadName}")
.to("file:/app/dev/dataland/archive");
from("file:/path?noop=true?delay=3000")
.startupOrder(1)
.log("start to process file: ${header.CamelFileName}")
.bean(CsvFilePreLoadChecker.class, "validateMetaData")
.end()
.log("Done processing file: ${header.CamelFileName}");
When an error occurs in MyProcessor.class the route processing is failed and therefore the file consumer does not delete the file.
Since the route processing is completed, the file consumer simply reads the (still present) file again.
If you want to move files with processing errors out of your way, you can use the moveFailed option of the file consumer. You would then have to move them back periodically to retry.
If you want to decouple file reading and MyProcessor.class you need to split the route into 2 routes. One that reads read the files and sends its messages to a queue or similar. The other consumes that queue and processes the messages.
I have a simple camel route which consumes from a Kafka topic. Does some processing and writes back to another kafka topic.
I needed to do some processing in between . I used seda in the route so that the kafka consumer doesn't get blocked on processing.
But after processing, Camel routes the message back to the source kafka endpoint and not to the destination endpoint.
from("kafka:<source endpoint details>")
.routeId("FromKafka")
.log("########: ${body}")
.to("seda:myseda?waitForTaskToComplete=Never");`
from("seda:myseda")
.routeId("sedaRoute")
.process(myprocessor)
.to("kafka:<destination endpoint details>"
The output payload is once again put in the source kafka topic. If I just replace seda with direct, it just works fine.
from("kafka:<source endpoint details>")
.routeId("FromKafka")
.log("########: ${body}")
.to("direct:mydirect");`
from("direct:mydirect")
.routeId("sedaRoute")
.process(myprocessor)
.to("kafka:<destination endpoint details>"
I suspected Kafka might be a request-reply exchange and the response is given back to the source endpoint. Hence tried adding "waitForTaskToComplete=Never" to seda. But no success.
Any help will be much appreciated.
I think you need to set the exchange pattern to "in only".
Like this:
.to(ExchangePattern.InOnly,"seda:myseda")
Incase your kafka consumer and producer topics are different. Apache camel by default keeps the consumer headers for producer as well .To avoid this, use have the update the kafka headers topic. Can we done using , bridgeEndpoint option for producer. If the option is true, then KafkaProducer will ignore the KafkaConstants.TOPIC header setting of the inbound message. Or u can directly set headers of KafkaConstants.TOPIC for producer.
This happens only for seda routes. works perfectly fine with direct route
I use apache camel to process files received on a ftp channel. My application is deployed in a cluster (4 nodes), for this I use RedisIdempotentRepository to ensure that a single node processes the file. My problem is that I want to delete the file after processing, if I use delete=true, the node A that processed the file when it finishes and will delete the file, node B already deleted it because the node B will not go through the filter and therefore it will directly access the deletion.
I would like to know how to only allow node A to delete the file?
from("sftp://host:port/folder?delete=true)
.idempotentConsumer(simple("${file:onlyname}"),
RedisIdempotentRepository.redisIdempotentRepository(redisTemplate, "camel-repo"))
.bean("orderTrackingFileProcessor");
I workaround this using pollEnrich:
adding delete step at end of processing:
.pollEnrich(remoteLocation + "?delete=true&fileName=${file:name}");
Full Example Route:
String remoteLocation = "sftp://host:port/folder";
from(remoteLocation)
.idempotentConsumer(simple("${file:onlyname}"),
RedisIdempotentRepository.redisIdempotentRepository(redisTemplate, "camel-repo"))
.bean("orderTrackingFileProcessor")
.pollEnrich(remoteLocation + "?delete=true&fileName=${file:name}");
Configure the ftp endpoint to use the redis idempotent repository directly and not the idempotent consumer EIP in the route afterwards. That ensures only 1 ftp consumer is processing the same file.
If you have Camel in Action 2nd ed book its covered in the 2nd half of the transaction chapter.
I want to put continue behaviour in route, my route is like following
from("file:D:\\?fileName=abc.csv&noop=true").split().unmarshal().csv()
.to("direct:insertToDb").end();
from("direct:insertToDb")
.to("direct:getDataId")
.to("direct:getDataParameters")
.to("direct:insertDataInDb");
from("direct:getDataId")
.to("sql:SELECT id FROM data WHERE name = :#name)
.choice()
.when(header("id").isGreaterThan(0) )
.setProperty("id", header("id"))
.otherwise()
.log("Error for")
.endChoice().end();
I want that if direct:getDataId dont find any record , my execution of route for current record from CSV get skip and program process next request. it would be equal to continue keyword.
How i can achieve this in Apache Camel route?
You can modify your routes like this:
from("file:D:\\?fileName=abc.csv&noop=true").split().unmarshal().csv()
.to("sql:SELECT id FROM data WHERE name = :#name?outputHeader=id&outputType=SelectOne)
.choice().when(header("id").isGreaterThan(0))
.to("direct:getDataParameters")
.to("direct:insertDataInDb")
.end();
Have you got a test for this? I suggest you try using CamelTestSupport because what you want is how camel will execute by default.
From Camel Split Docs:
stopOnException
default:false
description: Whether or not to stop continue processing immediately when an exception occurred. If disable, then Camel continue splitting and process the sub-messages regardless if one of them failed. You can deal with exceptions in the AggregationStrategy class where you have full control how to handle that.