Camel Use a splitter without aggregator - java

I'm new to Camel and I'd like to use it to read a XML file on a FTP server and to a assynch process for all NODE element of the XML.
Indeed, I'll use a splitter to process every node (I use a stream because the XML file is big).
from(ftp://user#host:port/...)
.split().tokenizeXML("node").streaming()
.to("seda:processNode")
.end();
Then the route to the nodeProcessor:
from("seda:processNode")
.bean(lookup(MyNodeProcessor.class))
.end();
I was wondering if it's ok to use a splitter without an aggregator? In my case, I don't need to aggregate the outcome of all processed nodes.
I was wondering if it's a problem in Camel to have many "splitted" threads going in a "dead end" instead of being aggreagated?
The examples provided by Camel show a splitter withtout an aggregator, but they still provide an aggregationStrategy with the splitter. Is it mandatory?

No this is perfect fine, you can use the splitter without the agg strategy which would be normal, like the splitter EIP: http://camel.apache.org/splitter
If you use an agg strategy then its more like this EIP: http://camel.apache.org/composed-message-processor.html which can be done with splitter only in Camel.

Related

Splitting, Aggregating then streaming writing on one big file using Apache Camel

I have a large database from which i load huge records. I process them in a batch mode using the splitter and agregator patterns.
The step where i'm stuck is the streaming of each batch to one json file where i want them all to be stored. Here are the steps :
Fetch records from DB
Process them as batchs of N
Each processed batch is written in a same big json file (missing step)
I tested this solution with the Append option from File2 but it does write multiples arrays in an an array. I could flatten this JSON but it takes me to one question.
How to stop the route from running knowing that i have two requirements :
After run the batch, the size at the start is not necessarly the same on in the end.
I tried to work with completionFromConsumer but does not work with quartz consumers.
I have this route :
from(endpointsURL))
.log(LoggingLevel.INFO, LOGGER, "Start fetching records")
.bean(DatabaseFetch, "fetch")
.split().method(InspectionSplittingStrategy.class, "splitItems")
.aggregate(constant(true), batchAggregationStrategy())
.completionPredicate(batchSizePredicate())
.completionTimeout(BATCH_TIME_OUT)
.log(LoggingLevel.INFO, LOGGER, "Start processing items")
.bean(ItemProcessor, "process")
.marshal()
.json(JsonLibrary.Jackson, true)
.setHeader(Exchange.FILE_NAME, constant("extract.json")))
.to("file:/json?doneFileName=${file:name}.done")
.log(LoggingLevel.INFO, LOGGER, "Processing done");
The problem here is as i supposed, my extract.json gets overwritten with every batch processed. I want to append every batch after an other.
I have no clue how to design and which pattern to use to make this possible. Stream and File have good features but in which fashion i can use them ?
You need to tell Camel to append to the file if it exists, add fileExists=Append as option to your file endpoint.
I changed the route strategy using only a splitting stategy :
from(endpointsURLs.get(START_AGENT))
.bean(databaseFetch, "fetch")
.split().method(SplittingStrategy.class, "splitItems")
.parallelProcessing()
.bean(databaseBatchExtractor, "launch")
.end()
.to("seda:generateExportFiles");
from("seda:generateExportFiles")
.bean(databaseFetch, "fetchPublications")
.multicast()
.parallelProcessing()
.to("direct:generateJson", "direct:generateCsv");
from("direct:generateJson")
.log("generate JSON file")
.marshal()
.json(JsonLibrary.Jackson, true)
.setHeader(Exchange.FILE_NAME, constant("extract.json")))
.to("file:/json?doneFileName=${file:name}.done")
.to("direct:notify");
from("direct:generateCsv")
.log("generate CSV file")
.bean(databaseFetch, "exportCsv")
.to("direct:notify");
from("direct:notify")
.log("generation done");
The important class SplittingStrategy :
public class SplittingStrategy {
private static final int BATCH_SIZE = 500;
private AtomicInteger counter = new AtomicInteger();
public Collection<List<Pair<Integer, Set<Integer>>>> splitItems(Map<Integer, Set<Integer>> itemsByID) {
List<Pair<Integer, Set<Integer>>> rawList = itemsByID.entrySet().stream()
.map((inspUA) -> new ImmutablePair<>(inspUA.getKey(), inspUA.getValue()))
.collect(Collectors.toList());
return rawList.parallelStream()
.collect(Collectors.groupingBy(pair -> counter.getAndIncrement() / BATCH_SIZE)).values();
}
}
With this strategy instead of using aggregate to re-assemble items. I embeeded the aggregation strategy as part of the splitting :
Transform my hashmap into an Iterable List> to be returned by the split method (c.f Splitter with POJO)
Split items in batches of 500 items size with a groupingBy of my initial list stream.
Give a comment or your opinion about it!

How to log the content of csv in Apache Camel?

I have the following code
DataFormat bindy = new BindyCsvDataFormat(Employee.class);
from("file:src/main/resources/csv2?noop=true").routeId("route3").unmarshal(bindy).to("mock:result").log("${body[0].name}");
I am trying to log every line of the csv file, currently I am only able to hardcode it to print.
Do I have to use Loop even I don't know the number of lines of the csv ? Or Do I have to use processor ? Whats the easiest way to achieve what I want ?
The unmarshalling step is producing an exchange whose body is a list of tuples. For that reason you can simply use Camel splitter to slice the original exchange into 1-N sub-exchanges (one per line/item of the list) and then log each of these lines:
from("file:src/main/resources/csv2?noop=true")
.unmarshal(bindy)
.split().body()
.log("${name}");
If you do not want to alter the original message, you can use the wiretap pattern in order to log a copy of the exchange:
from("file:src/main/resources/csv2?noop=true")
.unmarshal(bindy)
.wireTap("direct:logBody")
.to("mock:result");
from("direct:logBody")
.split().body()
.log("Row# ${exchangeProperty.CamelSplitIndex} : ${name}");

Efficient Camel Content Based Router: Route XML messages to the correct recipient based on contained tag with Java DSL

The problem:
I need to process different huge XML files. Each file contains a certain node which I can use to identify the incoming xml message by. Based on the node/tag the message should be send to a dedicated recipient.
The XML message should not be converted to String and then checked with contains as this would be really inefficient. Rather xpath should be used to "probe" the message for the occurrence of the expected node.
The solution should be based on camel's Java DSL. The code:
from("queue:foo")
.choice().xpath("//foo")).to("queue:bar")
.otherwise().to("queue:others");
suggested in Camel's Doc does not compile. I am using Apache Camel 2.19.0.
This compiles:
from("queue:foo")
.choice().when(xpath("//foo"))
.to("queue:bar")
.otherwise()
.to("queue:others");
You need the .when() to test predicate expressions when building a content-based-router.

No processor found in splitter after validation.

I have a Camel route that needs to receive a XML file from FTP as a stream, validate it and split it.
Everything works fine all the way to the validation, but then the split doesn't work as expected. When debugging, I found the split process doesn't find any processor when the original message is a stream. It looks very much like a bug to me.
from("direct:start")
.pollEnrich("ftp://user#host:21?fileName=file.xml&streamDownload=true&password=xxxx&fastExistsCheck=true&soTimeout=300000&disconnect=true")
.to("validator:myXsd.xsd")
.split().tokenizeXML("myTag")
.to(to)
.end();
In this case I can see the Exchange getting in the splitter, but no processor is found and the split does nothing. the behavior is different if I remove the validation:
from("direct:start")
.pollEnrich("ftp://user#host:21?fileName=file.xml&streamDownload=true&password=xxxx&fastExistsCheck=true&soTimeout=300000&disconnect=true")
.split().tokenizeXML("myTag")
.to(to)
.end();
In this case, the splitter works fine.
Also, if the XML file doesn't come from a stream, then everything is fine.
from("file:file.xml")
.to("validator:myXsd.xsd")
.split().tokenizeXML("myTag")
.to(to)
.end();
I update my Camel version to 2.15.2 but still get the same error.
I don't know how validator works, but if is changing message body, try to store it as a header or property, for example: .setHeader("headerName",simple("${body}")) and after validator .setBody(simple("${header.headerName}"))
The problem that I was trying to pass a body that was a stream. (streamDownload=true). The validator will read the stream and validate the content. No problem.
But the problem comes when the split arrives, the stream was already read and closed. So the split can't do anything with the stream.
I already worked around the problem without a stream, but I guess working with streamcaching would also work if a stream is necessary.
See http://camel.apache.org/why-is-my-message-body-empty.html

Apache camel from ftp to database

Is it possible to solve following scenario with apache camel:
Read from ftp (periodically) retrieve a zip file which contains xml, store this xml in database.
The main question is which features exists in camel and which functionality and need to write on my own?
Yes, your route could look something like this (off the top of my head):
JaxbDataFormat jaxb = new JaxbDataFormat("com.example.foobar");
from("ftp://user:pass#server:21/inbox")
.unmarshal().zip()
.split(xpath("//foo"))
.unmarshal(jaxb)
.to("jpa:com.example.foobar.Foo")
This will poll a FTP server, unzip files, split the content in XML fragments, transform these to JPA entities and finally persist these objects in a database. There are many variations possible, depending on your use case you can omit the splitter EIP or for example choose another persistence mechanism (MyBatis, Spring-JDBC, etc).

Categories