Delete processed file apache-camel in a cluster

Delete processed file apache-camel in a cluster - java

I use apache camel to process files received on a ftp channel. My application is deployed in a cluster (4 nodes), for this I use RedisIdempotentRepository to ensure that a single node processes the file. My problem is that I want to delete the file after processing, if I use delete=true, the node A that processed the file when it finishes and will delete the file, node B already deleted it because the node B will not go through the filter and therefore it will directly access the deletion.
I would like to know how to only allow node A to delete the file?
from("sftp://host:port/folder?delete=true)
.idempotentConsumer(simple("${file:onlyname}"),
RedisIdempotentRepository.redisIdempotentRepository(redisTemplate, "camel-repo"))
.bean("orderTrackingFileProcessor");

I workaround this using pollEnrich:
adding delete step at end of processing:
.pollEnrich(remoteLocation + "?delete=true&fileName=${file:name}");
Full Example Route:
String remoteLocation = "sftp://host:port/folder";
from(remoteLocation)
.idempotentConsumer(simple("${file:onlyname}"),
RedisIdempotentRepository.redisIdempotentRepository(redisTemplate, "camel-repo"))
.bean("orderTrackingFileProcessor")
.pollEnrich(remoteLocation + "?delete=true&fileName=${file:name}");

Configure the ftp endpoint to use the redis idempotent repository directly and not the idempotent consumer EIP in the route afterwards. That ensures only 1 ftp consumer is processing the same file.
If you have Camel in Action 2nd ed book its covered in the 2nd half of the transaction chapter.

Related

Batch consumer camel kafka

I am unable to read in batch with the kafka camel consumer, despite following an example posted here. Are there changes I need to make to my producer, or is the problem most likely with my consumer configuration?
The application in question utilizes the kafka camel component to ingest messages from a rest endpoint, validate them, and place them on a topic. I then have a separate service that consumes them from the topic and persists them in a time-series database.
The messages were being produced and consumed one at a time, but the database expects the messages to be consumed and committed in batch for optimal performance. Without touching the producer, I tried adjusting the consumer to match the example in the answer to this question:
How to transactionally poll Kafka from Camel?
I wasn't sure how the messages would appear, so for now I'm just logging them:
from(kafkaReadingConsumerEndpoint).routeId("rawReadingsConsumer").process(exchange -> {
// simple approach to generating errors
String body = exchange.getIn().getBody(String.class);
if (body.startsWith("error")) {
throw new RuntimeException("can't handle the message");
}
log.info("BODY:{}", body);
}).process(kafkaOffsetManager);
But the messages still appear to be coming across one at a time with no batch read.
My consumer config is this:
kafka:
host: myhost
port: myport
consumer:
seekTo: beginning
maxPartitionFetchBytes: 55000
maxPollRecords: 50
consumerCount: 1
autoOffsetReset: earliest
autoCommitEnable: false
allowManualCommit: true
breakOnFirstError: true
Does my config need work, or are there changes I need to make to the producer to have this work correctly?

At the lowest layer, the KafkaConsumer#poll method is going to return an Iterator<ConsumerRecord>; there's no way around that.
I don't have in-depth experience with Camel, but in order to get a "batch" of records, you'll need some intermediate collection to "queue" the data that you want to eventually send downstream to some "collection consumer" process. Then you will need some "switch" processor that says "wait, process this batch" or "continue filling this batch".
As far as databases go, that process is exactly what Kafka Connect JDBC Sink does with batch.size config.

We solved a similar requirement by using the Aggregation [1] capability provided by Camel
A rough code snippet
#Override
public void configure() throws Exception {
// 1. Define your Aggregation Strat
AggregationStrategy agg = AggregationStrategies.flexible(String.class)
.accumulateInCollection(ArrayList.class)
.pick(body());
from("kafka:your-topic?and-other-params")
// 2. Define your Aggregation Strat Params
.aggregate(constant(true), agg)
.completionInterval(1000)
.completionSize(100)
.parallelProcessing(true)
// 3. Generate bulk insert statement
.process(exchange -> {
List<String> body = (List<String>) exchange.getIn().getBody();
String query = generateBulkInsertQueryStatement("target-table", body);
exchange.getMessage().setBody(query);
})
.to("jdbc:dataSource");
}
There are a variety of strategies that you can implement, but we chose this particular one because it allows you to create a List of strings for the message contents that we need to ingest into the db. [2]
We set a variety of different params such as completionInterval & completionSize. The most important one for us was to set parallellProcessing(true) [3] ; without that our performance wasn't nearly getting the required throughput.
Once the aggregation has either collected 100 messages or 1000 ms has passed, then the processor generates a bulk insert statement, which then gets sent to the db.
[1] https://camel.apache.org/components/3.18.x/eips/aggregate-eip.html
[2] https://camel.apache.org/components/3.18.x/eips/aggregate-eip.html#_aggregating_into_a_list
[3] https://camel.apache.org/components/3.18.x/eips/aggregate-eip.html#_worker_pools

Configure Redelivery for a file when Exception occures in camel route

I have a pretty easy route where I pickup files from a directory and send it to a bean:
from("file:/mydir?delete=true").bean(MyProcessor.class);
It can happen that an exception occures in MyProcessor.class and so I want to delay the processing of that file again. How can I setup a redelivery for that as I tried already different things with
onException().redeliveryDelay(10000);
but it didn't work and right after the exception the same file gets processed again.

Did you do onException() before Processing?
Example:
errorHandler(defaultErrorHandler()
.maximumRedeliveries(2)
.redeliveryDelay(5000)
.retryAttemptedLogLevel(LoggingLevel.WARN));
// exception handler for specific exceptions
onException(IOException.class).maximumRedeliveries(1).redeliveryDelay(5000);
//only the failed record send write to error folder
onException(CsvRecordException.class)
.to("file:/app/dev/dataland/error");
onCompletion()
.log("global thread: ${threadName}")
.to("file:/app/dev/dataland/archive");
from("file:/path?noop=true?delay=3000")
.startupOrder(1)
.log("start to process file: ${header.CamelFileName}")
.bean(CsvFilePreLoadChecker.class, "validateMetaData")
.end()
.log("Done processing file: ${header.CamelFileName}");

When an error occurs in MyProcessor.class the route processing is failed and therefore the file consumer does not delete the file.
Since the route processing is completed, the file consumer simply reads the (still present) file again.
If you want to move files with processing errors out of your way, you can use the moveFailed option of the file consumer. You would then have to move them back periodically to retry.
If you want to decouple file reading and MyProcessor.class you need to split the route into 2 routes. One that reads read the files and sends its messages to a queue or similar. The other consumes that queue and processes the messages.

Transmitting large data between two spring boot app

I have two spring boot services names A and B.
Service A receives a multipart file (which is huge in size > 100MB - 1GB in size).
Service A needs to transmit the file to Service B (problem area)
Service B performs some operation on data and return to Service A.
Calls from A to B is sent using RestTemplate.
Problem Area
Service A doesn't load the whole file in memory, but read from Multipart InputStream in chunks.
Chunks are sent to B, rather a complete stream.
Is it possible to send the complete InputStream of multipart to service B, so that service A doesn't load the file as well as chunk the file over http/...(any other way)
(One way I had solved it to use fifo on the underlying linux platform and service A is writing continuously on a pipe and Service B is reading) - was looking someway to achieve it with http if possible.
Background - I want Service B to be in control of how much chunk they want and not Service A. It is because of the operation Service B perform - service B adds some additional bytes, so that when the reverse operation is performed on next data, it can see these additional bytes to understand how much more chunk should be read.
UPDATE OF THE GITHUB
I am not sure where I am getting it wrong? I pasted the complete project - https://github.com/robin-carry/large-data-transfer
The main files on client/server where all logic is below / README.md explains everything I am trying/failing to do...
client/src/main/java/com/lockdown/lazy/client/controller/UploadController.java
server/src/main/java/com/lockdown/lazy/server/controller/OpServiceController.java

Apache Http Client documentation talks about Request/Response entity streaming - see if that helps solve your scenario.

You can design interface with org.springframework.web.multipart.MultipartFile as argument, then- use it as feign client on sender side and implement it in controller on receiver side

Check file is processed through all routes of camel using aggregator

I have a camel application where, I am reading files from FTP source.
Then the file goes through multiple routes, like one route goes to cassandra for storage, one route process the data and push pivoted data to Kafka topic etc
I want to mark the file processed when it goes through all routes and reaches till the end. This way I can build a processing completed log based on file name.
One way, That I can think of is to implement aggregator, where each route will send completion notification in exchange header and then based on the completion criteria logic in aggregator, I will mark that file as processed.
How will I write such aggregator in java?

You could try using multicast.
from("direct:start")
.multicast()
.to("direct:a","direct:b")
.end()
// Won't run until the sub routes are complete
.process(new MarkFileAsCompletedProcessor())
.log("Finished multicast");
from("direct:a")
.log("Processing a")
.to("mock:endOfA");
from("direct:b")
.log("Processing b")
.to("mock:endOfB");

Parse byteBuffer in Websphere MQ Exit

I'm trying to build a custom mq exit to archive messages that hit a queue. I have the following code.
class MyMqExits implements WMQSendExit, WMQReceiveExit{
#Override
public ByteBuffer channelReceiveExit(MQCXP arg0, MQCD arg1, ByteBuffer arg2) {
// TODO Auto-generated method stub
if ( arg2){
def _bytes = arg2.array()
def results = new String(_bytes)
println results;
}
return arg2;
}
...
The content of the message (header/body) is in the byte buffer, along with some unreadable binary information. How can I parse the message (including the body and the queue name) from arg2? We've gone through IBM's documentation, but haven't found an object or anything that makes this easy.

Assuming the following two points:
1) Your sender application has not hard coded the queue name where it puts messages. So you can change the application configuration to send messages to a different object.
2) MessageId of the archived message is not important, only message body is important.
Then one alternative I can think of is to create an Alias queue that resolves to a Topic and use two subscribers to receive messages.
1) Subscriber 1: An administratively defined durable subscriber with a queue provided to receive messages. Provide the same queue name from which your existing consumer application is receiving messages.
2) Subscriber 2: Another administratively defined durable subscriber with queue provided. You can write a simple java application to get messages from this queue and archive.
3) Both subscribers subscribe to the same topic.
Here are steps:
// Create a topic
define topic(ANY.TOPIC) TOPICSTR('/ANY_TOPIC')
// Create an alias queue that points to above created topic
define qalias(QA.APP) target(ANY.TOPIC) targtype(TOPIC)
// Create a queue for your application that does business logic. If one is available already then no need to create.
define ql(Q.BUSLOGIC)
// Create a durable subscription with destination queue as created in previous step.
define sub(SB.BUSLOGIC) topicstr('/ANY_TOPIC') dest(Q.BUSLOGIC)
// Create a queue for application that archives messages.
define ql(Q.ARCHIVE)
// Create another subscription with destination queue as created in previous step.
define sub(SB.ARCHIVE) topicstr('/ANY_TOPIC') dest(Q.ARCHIVE)
Write a simple MQ Java/JMS application to get messages from Q.ARCHIVE and archive messages.

A receive exit is not going to give you the whole message. Send and receive exits operate on the transmission buffers sent/received by channels. These will contain various protocol flows which are not documented because the protocol is not public, and part of those protocol flows will be chunks of the messages broken down to fit into 32Kb chunks.
You don't give enough information in your question for me to know what type of channel you are using, but I'm guessing it's on the client side since you are writing it in Java and that is the only environment where that is applicable.
Writing the exit at the client side, you'll need to be careful you deal with the cases where the message is not successfully put to the target queue, and you'll need to manage syncpoints etc.
If you were using QMgr-QMgr channels, you should use a message exit to capture the MQXR_MSG invocations where the whole message is given to you. If you put any further messages in a channel message exit, the messages you put are included in the channel's Syncpoint and so committed if the original messages were committed.
Since you are using client-QMgr channels, you could look at an API Exit on the QMgr end (currently client side API Exits are only supported for C clients) and catch all the MQPUT calls. This exit would also give you the MQPUT return codes so you could code your exit to look out for, and deal with failed puts.
Of course, writing an exit is a complicated task, so it may be worth finding out if there are any pre-written tools that could do this for you instead of starting from scratch.

I fully agree with Morag & Shashi, wrong approach. There is an open source project called Message Multiplexer (MMX) that will get a message from a queue and output it to one or more queues. Context information is maintained across the message put(s). For more info on MMX go to: http://www.capitalware.com/mmx_overview.html
If you cannot change the source or target queues to insert MMX into the mix then an API Exit may do the trick. Here is a blog posting about message replication via an API Exit: http://www.capitalware.com/rl_blog/?p=3304

This is quite an old question but it's worth replying with an update that's relevant to MQ 9.2.3 or later. There is a new feature called Streaming Queues (see https://www.ibm.com/docs/en/ibm-mq/9.2?topic=scenarios-streaming-queues) and one of the use-cases it is designed to support is putting a copy of every message sent to a given queue, to an alternative queue. Another application can then consume the duplicate messages and archive them separately to the application that is processing the original messages.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.