Using camel to aggregate messages of same header - java

I have multiple clients that send files to a server. For one set of data there are two files that contain information about that data, each with the same name. When a file is received, the server sends a message out to my queue containing the file path, file name, ID of the client, and the "type" of file it is (all have same file extension but there are two "types," call them A and B).
The two files for one set of data have the same file name. As soon as the server has received both of the files I need to start a program that combines the two. Currently I have something that looks like this:
from("jms:queue.name").aggregate(header("CamelFileName")).completionSize(2).to("exec://FILEPATH?args=");
Where I am stuck is the header("CamelFileName"), and more specifically how the aggregator works.
With the completionSize set to 2 does it just suck up all the messages and store them in some data structure until a second message that matches the first comes through? Also, does the header() expect a specific value? I have multiple clients so I was thinking of having the client ID and the file name in the header, but then again I don't know if I have to give a specific value. I also don't know if I can use a regex or not.
Any ideas or tips would be super helpful.
Thanks
EDIT:
Here is some code I have now. Based on my description of the problem here and in comments on selected answer does it seem accurate (besides close brackets that I didn't copy over)?
public static void main(String args[]) throws Exception{
CamelContext c = new DefaultCamelContext();
c.addComponent("activemq", activeMQComponent("vm://localhost?broker.persistent=false"));
//ActiveMQConnectionFactory connectionFactory = new ActiveMQConnectionFactory("vm://localhost?broker.persistent=false");
//c.addComponent("jms", JmsComponent.jmsComponentAutoAcknowledge(connectionFactory));
c.addRoutes(new RouteBuilder() {
public void configure() {
from("activemq:queue:analytics.camelqueue").aggregate(new MyAggregationStrategy()).header("subject").completionSize(2).to("activemq:queue:analytics.success");
}
});
c.start();
while (true) {
System.out.println("Waiting on messages to come through for camel");
Thread.sleep(2 * 1000);
}
//c.stop();
}
private static class MyAggregationStrategy implements AggregationStrategy {
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
if (oldExchange == null)
return newExchange;
// and here is where combo stuff goes
String oldBody = oldExchange.getIn().getBody(String.class);
String newBody = newExchange.getIn().getBody(String.class);
boolean oldSet = oldBody.contains("set");
boolean newSet = newBody.contains("set");
boolean oldFlow = oldBody.contains("flow");
boolean newFlow = newBody.contains("flow");
if ( (oldSet && newFlow) || (oldFlow && newSet) ) {
//they match so return new exchange with info so extractor can be started with exec
String combined = oldBody + "\n" + newBody + "\n";
newExchange.getIn().setBody(combined);
return newExchange;
}
else {
// no match so do something....
return null;
}
}
}

you must supply an AggregationStrategy to define how you want to combine Exchanges...
if you are only interested in the fileName and receiving exactly 2 Exchanges, then you can just use the UseLatestAggregationStrategy to just pass the newest Exchange through once 2 have been 'aggregated'...
that said, it sounds like you need to retain both Exchanges (one for each clientId) so you can pass that info on to the 'exec' step...if so, you can just combine the Exchanges into a GroupedExchange holder using the built-in aggregation strategy enabled via the groupExchanges option...or specificy a custom AggregationStrategy to combine them however you'd like. just need to keep in mind that your 'exec' step needs to handle whatever aggregated structure you decide to use...
see these unit tests for examples:
https://svn.apache.org/repos/asf/camel/trunk/camel-core/src/test/java/org/apache/camel/processor/aggregator/AggregatorTest.java
https://svn.apache.org/repos/asf/camel/trunk/camel-core/src/test/java/org/apache/camel/processor/aggregator/AggregateGroupedExchangeTest.java

Related

IBKR TWS API - How to tell when reqOptionsMktData is complete for all strikes?

I am just getting started with IBKR API on Java. I am following the API sample code, specifically the options chain example, to figure out how to get options chains for specific stocks.
The example works well for this, but I have one question - how do I know once ALL data has been loaded? There does not seem to be a way to tell. The sample code is able to tell when each individual row has been loaded, but there doesn't seem to be a way to tell when ALL strikes have been successfully loaded.
I thought that using tickSnapshotEnd() would be beneficial, but it doesn't not seem to work as I would expect it to. I would expect it to be called once for every request that completes. For example, if I do a query for a stock like SOFI on the 2022/03/18 expiry, I see that there are 35 strikes but tickSnapshotEnd() is called 40+ times, with some strikes repeated more than once.
Note that I am doing requests for snapshot data, not live/streaming data
reqOptionsMktData is obviously a method in the sample code you are using. Not sure what particular code your using, so this is a general response.
Firstly you are correct, there is no way to tell via the API, this must be done by the client. Of course it will provide the requestID that was used when the request was made. The client needs to remember what each requestID was for and decide how to process that information when it is received in the callbacks.
This can be done via a dictionary or hashtable, where upon receiving data in the callback then check if the chain is complete.
Message delivery from the API often has unexpected results, receiving extra messages is common and is something that needs to be taken into account by the client. Consider the API stateless, and track everything in the client.
Seems you are referring to Regulatory Snapshots, I would encourage you to look at the cost. It could quite quickly add up to the price of streaming live data. Add to that the 1/sec limit will make a chain take a long time to load. I wouldn't even recommend using snapshots with live data, cancelling the request yourself is trivial and much faster.
Something like (this is obviously incomplete C#, just a starting point)
class OptionData
{
public int ReqId { get; }
public double Strike { get; }
public string Expiry { get; }
public double? Bid { get; set; } = null;
public double? Ask { get; set; } = null;
public bool IsComplete()
{
return Bid != null && Ask != null;
}
public OptionData(int reqId, double strike, ....
{ ...
}
...
class MyData()
{
// Create somewhere to store our data, indexed by reqId.
Dictionary<int, OptionData> optChain = new();
public MyData()
{
// We would want to call reqSecDefOptParams to get a list of strikes etc.
// Choose which part of the chain you want, likely you'll want to
// get the current price of the underlying to decide.
int reqId = 1;
...
optChain.Add(++reqId, new OptionData(reqId,strike,expiry));
...
// Request data for each contract
// Note the 50 msg/sec limit https://interactivebrokers.github.io/tws-api/introduction.html#fifty_messages
// Only 1/sec for Reg snapshot
foreach(OptionData opt in optChain)
{
Contract con = new()
{
Symbol = "SPY",
Currency = "USD"
Exchange = "SMART",
Right = "C",
SecType = "OPT",
Strike = opt.strike,
Expiry = opt.Expiry
};
ibClient.ClientSocket.reqMktData(opt.ReqId, con, "", false, true, new List<TagValue>());
}
}
...
private void Recv_TickPrice(TickPriceMessage msg)
{
if(optChain.ContainsKey(msg.RequestId))
{
if (msg.Field == 2) optChain[msg.RequestId].Ask = msg.Price;
if (msg.Field == 1) optChain[msg.RequestId].Bid = msg.Price;
// You may want other tick types as well
// see https://interactivebrokers.github.io/tws-api/tick_types.html
if(optChain[msg.RequestId].IsComplete())
{
// This wont apply for reg snapshot.
ibClient.ClientSocket.cancelMktData(msg.RequestId);
// You have the data, and have cancelled the request.
// Maybe request more data or update display etc...
// Check if the whole chain is complete
bool complete=true;
foreach(OptionData opt in optChain)
if(!opt.IsComplete()) complete=false;
if(complete)
// do whatever
}
}
}

How does one go about chaining several ChannelFuture objects with Netty?

I have a hashmap that contains ip/port information and a message that has to be sent to the whole list.
So I decided to create a small method that accepts the hashmap and the message and does this. It looks something like this:
public static ChannelFuture sendMessageTo(Map<JsonElement, JsonObject> list, String message) {
Set<JsonElement> keys = list.keySet();
for (JsonElement key : keys) { //iterate through the map
ChannelInboundHandler[] handlers = {
new MessageCompletenessHandler(),
new MessageRequest(message),
};
JsonObject identity = list.get(key);
ChannelFuture f = connectWithHandler(identity.get("ip").getAsString(), identity.get("port").getAsInt(), handlers); //to the following ip/port send message and return ChannelFuture
}
return result; //here result should be a ChannelFuture that when .addListener is added it should be called only when ALL the ChannelFuture-s from the for loop have finished(a.k.a. all messages have been sent)
}
The comments should explain the situation clearly enough.
The question is how do I implement this ChannelFuture result.
I know I can .sync() the ChannelFuture-s, but this defeats the purpose of async networking.
P.S.: I essentially want to have the functionality described here https://twistedmatrix.com/documents/16.2.0/api/twisted.internet.defer.DeferredList.html but am failing to find an equivalent.
In general what you're trying to achieve is not really correct async way. However netty has an utility class for that kind of task - DefaultChannelGroupFuture.java. However it is package private and used only for DefaultChannelGroup.java which indeed does what you described in your code. So you could easily copy this DefaultChannelGroupFuture and use it. To be more specific :
Collection<ChannelFuture> futures = ArrayList<ChannelFuture>();
...
//here you add your futures
ChannelFuture f = connectWithHandler(identity.get("ip").getAsString(), identity.get("port").getAsInt(), handlers);
futures.add(f);
...
DefaultChannelGroupFuture groupOfFutures = new DefaultChannelGroupFuture(futures, executor);
if (groupOfFutures.sync().isSuccess()) {
}
Have in mind you'll need to change DefaultChannelGroupFuture for your needs.

Camel Split InputStream by length not by token

I have an input file like this
1234AA11BB4321BS33XY...
and I want to split it into single messages like this
Message 1: 1234AA11BB
Message 2: 4321BS33XY
transform the records into Java objects, marshal them to xml with jaxb and aggregate about 1000 records in the outgoing Message.
Transformation and marshalling is no problem but I can't split the String above.
There is no delimiter but the length. Every Record is exactly 10 characters long.
I was wondering if there is an out of the box solution like
split(body().tokenizeBySize(10)).streaming()
Since in reality each record consists of 300 characters and there may be 500.000 records in a file, I want to split an InputStream.
In other examples I saw custom iterators used for splitting but all of them where token or xml based.
Any idea?
By the way we are bound to Java 6 and camel 2.13.4
Thanks
Nick
The easiest way would be to split by empty string - .split().tokenize("", 10).streaming() - meaning that tokenizer will take each character - and group 10 tokens (characters) together and then aggregate them into a single group e.g.
#Override
public void configure() throws Exception {
from("file:src/data?delay=3000&noop=true")
.split().tokenize("", 10).streaming()
.aggregate().constant(true) // all messages have the same correlator
.aggregationStrategy(new GroupedMessageAggregationStrategy())
.completionSize(1000)
.completionTimeout(5000) // use a timeout or a predicate
// to know when to stop
.process(new Processor() { // process the aggregate
#Override
public void process(final Exchange e) throws Exception {
final List<Message> aggregatedMessages =
(List<Message>) e.getIn().getBody();
StringBuilder builder = new StringBuilder();
for (Message message : aggregatedMessages) {
builder.append(message.getBody()).append("-");
}
e.getIn().setBody(builder.toString());
}
})
.log("Got ${body}")
.delay(2000);
}
EDIT
Here's my memory consumption in streaming mode with 2s delay for a 100MB file:
Why not let a normal java class do the splitting and refer to it? See here:
http://camel.apache.org/splitter.html
Code example taken from the documentation.
The below java dsl uses the "method" to call the split method defined in a separate class.
from("direct:body")
// here we use a POJO bean mySplitterBean to do the split of the payload
.split().method("mySplitterBean", "splitBody")
Below you define your splitter and return each split message.
public class MySplitterBean {
/**
* The split body method returns something that is iteratable such as a java.util.List.
*
* #param body the payload of the incoming message
* #return a list containing each part splitted
*/
public List<String> splitBody(String body) {
// since this is based on an unit test you can of cause
// use different logic for splitting as Camel have out
// of the box support for splitting a String based on comma
// but this is for show and tell, since this is java code
// you have the full power how you like to split your messages
List<String> answer = new ArrayList<String>();
String[] parts = body.split(",");
for (String part : parts) {
answer.add(part);
}
return answer;
}

Java Akka Actors - Message throttling and priority

Newbie here..
Using akka version: akka-actor_2.11(2.4.8) via the Java API.
I'm trying to develop an actor for generating PDF documents. These PDF documents can be large so obviously I want to throttle the rate in which the actor processes the request. Also as a side requirement, I also need a "prioritizable" inbox by which the PDF generation requests can be processed based on priority by the underlying actors.
In my application startup, I create a global props like this:
Props.create(PdfGeneratorActor.class).withDispatcher("prio-dispatcher").withRouter(new RoundRobinPool(1))
Then I create actor per pdf request like this:
actorSystem.actorOf(propsObjShownAbove, actorType.getCanonicalName() + "_" + UUID.randomUUID());
My application.conf looks like this:
prio-dispatcher {
mailbox-type = "com.x.y.config.PriorityMailbox"
}
My PriorityMailbox looks like this:
public class PriorityMailbox extends UnboundedPriorityMailbox {
// needed for reflective instantiation
public PriorityMailbox(final ActorSystem.Settings settings, final Config config) {
super(new PriorityGenerator() {
#Override
public int gen(final Object message) {
System.out.println("Here is my message to be prioritized: "+message);
if (message instanceof Prioritizable) {
Prioritizable prioritizable = (Prioritizable) message;
if (prioritizable.getReportPriorityType() == ReportPriorityType.HIGH) {
return 0;
} else if (prioritizable.getReportPriorityType() == ReportPriorityType.LOW) {
return 2;
} else if (message.equals(PoisonPill.getInstance())) {
return 3; // PoisonPill when no other left
} else {
return 1;
}
} else {
// Default priority for any other messages.
return 1;
}
}
});
}
}
Is this the right configuration to achieve what I wanted? I'm not sure if I'm missing something. Firstly, I can't see any System.out.prints on my mailbox implementation. I would imagine it should come there to compare the priority.
Secondly, I would expect the PdfGenerationActor to be executing sequentially (one by one) because it is essentially a single instance across the system. But I don't see that happening. I see multiple actors processing the requests concurrently.
I think I'm missing something fundamental here.
I think what happens in your case is that each actor you create has it's own router, but otherwise they are independent - so they execute in parallel.
If you want your requests to be executed sequentially the idea would be to have one router with one "worker"/routee that executes each request one by one. (of course you could configure the number of requests you want to execute in parallel)
So you would have something like this:
in the conf:
mypriority-mailbox {
mailbox-type = "com.x.y.config.PriorityMailbox"
mailbox-capacity = 500 #some stuff - you may want to check what you want here - if you want something
mailbox-push-timeout-time = 100s #some other stuff - check if it makes sense for you
}
actor {
/pdfRouter{
router = round-robin-pool
nr-of-instances = 1
mailbox = mypriority-mailbox
}
}
in the code:
system.actorOf(
FromConfig.getInstance().props(PdfGeneratorActor.class),
"pdfRouter");
}
check also the documentation for mailboxes and routers

How do I aggregate file content correctly with Apache Camel?

I am writing a tool to parse some very big files, and I am implementing it using Camel. I have used Camel for other things before and it has served me well.
I am doing an initial Proof of Concept on processing files in streaming mode, because if I try to run a file that is too big without it, I get a java.lang.OutOfMemoryError.
Here is my route configuration:
#Override
public void configure() throws Exception {
from("file:" + from)
.split(body().tokenize("\n")).streaming()
.bean(new LineProcessor())
.aggregate(header(Exchange.FILE_NAME_ONLY), new SimpleStringAggregator())
.completionTimeout(150000)
.to("file://" + to)
.end();
}
from points to the directory where my test file is.
to points to the directory where I want the file to go after processing.
With that approach I could parse files that had up to hundreds of thousands of lines, so it's good enough for what I need. But I'm not sure the file is being aggregated correctly.
If i run cat /path_to_input/file I get this:
Line 1
Line 2
Line 3
Line 4
Line 5
Now on the output directory cat /path_to_output/file I get this:
Line 1
Line 2
Line 3
Line 4
Line 5%
I think this might be a pretty simple thing, although I don't know how to solve this. both files have slightly different byte sizes as well.
Here is my LineProcessor class:
public class LineProcessor implements Processor {
#Override
public void process(Exchange exchange) throws Exception {
String line = exchange.getIn().getBody(String.class);
System.out.println(line);
}
}
And my SimpleStringAggregator class:
public class SimpleStringAggregator implements AggregationStrategy {
#Override
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
if(oldExchange == null) {
return newExchange;
}
String oldBody = oldExchange.getIn().getBody(String.class);
String newBody = newExchange.getIn().getBody(String.class);
String body = oldBody + "\n" + newBody;
oldExchange.getIn().setBody(body);
return oldExchange;
}
}
Maybe I shouldn't even worry about this, but I would just like to have it working perfectly since this is just a POC before I get to the real implementation.
It looks like your input files last character is a line break. You split up the file with \n and add it back in the aggregator except for the last line. Because there is no new line left the line terminator \n is removed from the last line. One solution might by adding the \n in advance:
String body = oldBody + "\n" + newBody + "\n";
The answer from 0X00me is probably correct however you are doing unneeded work probably.
I assume you are using a version of camel higher than 2.3. In which case you can drop the aggregation implementation completely as according to the camel documentation:
Camel 2.3 and newer:
The Splitter will by default return the original input message.
Change your route to something like this(I cant test it):
#Override
public void configure() throws Exception {
from("file:" + from)
.split(body().tokenize("\n")).streaming()
.bean(new LineProcessor())
.completionTimeout(150000)
.to("file://" + to)
.end();
}
If you need to do custom aggregation then you need to implement the aggregator. I process files this way daily and always end with exactly what I started with.

Categories