Java Akka Actors - Message throttling and priority - java

Newbie here..
Using akka version: akka-actor_2.11(2.4.8) via the Java API.
I'm trying to develop an actor for generating PDF documents. These PDF documents can be large so obviously I want to throttle the rate in which the actor processes the request. Also as a side requirement, I also need a "prioritizable" inbox by which the PDF generation requests can be processed based on priority by the underlying actors.
In my application startup, I create a global props like this:
Props.create(PdfGeneratorActor.class).withDispatcher("prio-dispatcher").withRouter(new RoundRobinPool(1))
Then I create actor per pdf request like this:
actorSystem.actorOf(propsObjShownAbove, actorType.getCanonicalName() + "_" + UUID.randomUUID());
My application.conf looks like this:
prio-dispatcher {
mailbox-type = "com.x.y.config.PriorityMailbox"
}
My PriorityMailbox looks like this:
public class PriorityMailbox extends UnboundedPriorityMailbox {
// needed for reflective instantiation
public PriorityMailbox(final ActorSystem.Settings settings, final Config config) {
super(new PriorityGenerator() {
#Override
public int gen(final Object message) {
System.out.println("Here is my message to be prioritized: "+message);
if (message instanceof Prioritizable) {
Prioritizable prioritizable = (Prioritizable) message;
if (prioritizable.getReportPriorityType() == ReportPriorityType.HIGH) {
return 0;
} else if (prioritizable.getReportPriorityType() == ReportPriorityType.LOW) {
return 2;
} else if (message.equals(PoisonPill.getInstance())) {
return 3; // PoisonPill when no other left
} else {
return 1;
}
} else {
// Default priority for any other messages.
return 1;
}
}
});
}
}
Is this the right configuration to achieve what I wanted? I'm not sure if I'm missing something. Firstly, I can't see any System.out.prints on my mailbox implementation. I would imagine it should come there to compare the priority.
Secondly, I would expect the PdfGenerationActor to be executing sequentially (one by one) because it is essentially a single instance across the system. But I don't see that happening. I see multiple actors processing the requests concurrently.
I think I'm missing something fundamental here.

I think what happens in your case is that each actor you create has it's own router, but otherwise they are independent - so they execute in parallel.
If you want your requests to be executed sequentially the idea would be to have one router with one "worker"/routee that executes each request one by one. (of course you could configure the number of requests you want to execute in parallel)
So you would have something like this:
in the conf:
mypriority-mailbox {
mailbox-type = "com.x.y.config.PriorityMailbox"
mailbox-capacity = 500 #some stuff - you may want to check what you want here - if you want something
mailbox-push-timeout-time = 100s #some other stuff - check if it makes sense for you
}
actor {
/pdfRouter{
router = round-robin-pool
nr-of-instances = 1
mailbox = mypriority-mailbox
}
}
in the code:
system.actorOf(
FromConfig.getInstance().props(PdfGeneratorActor.class),
"pdfRouter");
}
check also the documentation for mailboxes and routers

Related

RabbitMQ events recieved in quick succession are written twice to MongoDB causing micro service test to fail

Service testing a reactive microservice yields wrong results because RabbitMQ events are double persisted in MongoDB. If 2 events appear in quick succession, the second event should not be persisted because the first event should already be there, and a check should be run to see if it is, and the event should then be processed differently (by inrementing a counter in the existing document).
This happens presumably because events are are published very quickly in succession (and maybe something with mongo's consurrency model or java threading?).
At the rabbitMQ endpoint, when using this code it works .await().indefinitely():
historicalDataHandler
.handleDischargedVerifiedEvent(dischargeVerifiedEvent)
.await()
.indefinitely();
But this code does not (reactive, .subscribe().with()):
historicalDataHandler
.handleDischargedVerifiedEvent(dischargeVerifiedEvent)
.subscribe()
.with(
dbResult -> {
log.infov(
EVENT_HANDLED_LOG_STRING + ", {dischargedVerifiedEvent}, {dbResult}",
dischargeVerifiedEvent,
dbResult);
},
throwable ->
log.errorv(throwable, "Handling discharge verified event fails {event}", event));
I know that the reason is that the first will block the thread until the Uni is resolved. The second version will just call the function and let the thread loose for other tasks.
I prefer using the latter solution because we want our code to be reactive / non-blocking.
Here is the code from the Panache MongoDB repository:
#Startup
#ApplicationScoped
public class FlightAllocationBagCountRepository
implements ReactivePanacheMongoRepositoryBase<FlightAllocationBagCount, String> {
#Inject Logger log;
public Uni<UpdateResult> updateFlightAllocationBagCount(
FlightAllocationBagCount flightAllocationBagCount) {
var allocation = flightAllocationBagCount.getAllocations().get(0);
return this.mongoCollection()
.updateOne(
and(
eq(COMPOSITE_ID, flightAllocationBagCount.getCompositeId()),
eq(
ALLOCATIONS + "." + AllocationPropertyConstants.COMPOSITE_ID,
allocation.getCompositeId())),
inc(ALLOCATIONS + ".$." + BAG_COUNT, 1))
.onItem()
.transformToUni(
incResult -> {
// If the increment was done, the event is now stored
if (incResult.getModifiedCount() > 0) {
log.infov(
"Incremented bag count, {flight}, {allocation}",
flightAllocationBagCount,
allocation);
return Uni.createFrom().item(incResult);
}
// The flight and allocation combination did not exist - create them
var upsertResult =
this.mongoCollection()
.updateOne(
and(eq(COMPOSITE_ID, flightAllocationBagCount.getCompositeId())),
combine(generateBsonUpdates(flightAllocationBagCount)),
new UpdateOptions().upsert(true));
log.infov(
"Flight document created / updated with allocation, {flight}, {allocation}",
flightAllocationBagCount,
allocation);
return upsertResult;
});
}
}
Here is the test that gives the wrong result. What we do in the test is:
Publish some events on the RabbitMQ queue
Wait to see in the logs that those events have been processed
Send a request to the gRPC interface and assert
#Test
void getAverageBagCountBySubAllocation() throws Exception {
// Given two items to same sub allocation on same date
WaitingConsumer consumer = getValidItemEventWaitingConsumer();
publish(getDefaultFlightEvent());
publish(getDefaultItemEvent("BARCODE1"));
publish(getDefaultItemEvent("BARCODE2"));
// and that the events are successfully handled
consumer.waitUntil(
frame -> frame.getUtf8String().contains(EVENT_HANDLED_LOG_STRING), 30, SECONDS);
// And given a request of a specific sub allocation
var allocation =
HistoricalData.Allocation.newBuilder()
.setBagClass(DEFAULT_BAG_CLASS)
.setDestination(DEFAULT_BAG_DESTINATION)
.setBagExceptionType(DEFAULT_BAG_EXCEPTION_TYPE.toGrpcEnum())
.setTransfer(DEFAULT_TRANSFER)
.build();
var request =
HistoricalData.GetAverageBagCountRequest.newBuilder()
.setAirline(DEFAULT_AIRLINE)
.setFlightNumber(DEFAULT_FLIGHT_NUMBER)
.setFlightSuffix(DEFAULT_FLIGHT_SUFFIX)
.setFlightDate(DEFAULT_FLIGHT_DATE_STAMP)
.setMinNumberOfDays(DEFAULT_MIN_NUMBER_OF_DAYS)
.setMaxNumberOfDays(DEFAULT_MAX_NUMBER_OF_DAYS)
.setAllocation(allocation)
.build();
// When we request to get average bag count by sub allocation
var response = getStub().getAverageBagCount(request);
// Then the average bag count is 2, based on the average of 1 days
assertThat(response.getAverageBagCount()).isEqualTo(2);
assertThat(response.getNumberOfDaysUsedForAverageCalculation()).isEqualTo(1);
}
Bonus question: How can we avoid consumer.waitUntil() ? Using that makes the test no longer black-box testing.

IBKR TWS API - How to tell when reqOptionsMktData is complete for all strikes?

I am just getting started with IBKR API on Java. I am following the API sample code, specifically the options chain example, to figure out how to get options chains for specific stocks.
The example works well for this, but I have one question - how do I know once ALL data has been loaded? There does not seem to be a way to tell. The sample code is able to tell when each individual row has been loaded, but there doesn't seem to be a way to tell when ALL strikes have been successfully loaded.
I thought that using tickSnapshotEnd() would be beneficial, but it doesn't not seem to work as I would expect it to. I would expect it to be called once for every request that completes. For example, if I do a query for a stock like SOFI on the 2022/03/18 expiry, I see that there are 35 strikes but tickSnapshotEnd() is called 40+ times, with some strikes repeated more than once.
Note that I am doing requests for snapshot data, not live/streaming data
reqOptionsMktData is obviously a method in the sample code you are using. Not sure what particular code your using, so this is a general response.
Firstly you are correct, there is no way to tell via the API, this must be done by the client. Of course it will provide the requestID that was used when the request was made. The client needs to remember what each requestID was for and decide how to process that information when it is received in the callbacks.
This can be done via a dictionary or hashtable, where upon receiving data in the callback then check if the chain is complete.
Message delivery from the API often has unexpected results, receiving extra messages is common and is something that needs to be taken into account by the client. Consider the API stateless, and track everything in the client.
Seems you are referring to Regulatory Snapshots, I would encourage you to look at the cost. It could quite quickly add up to the price of streaming live data. Add to that the 1/sec limit will make a chain take a long time to load. I wouldn't even recommend using snapshots with live data, cancelling the request yourself is trivial and much faster.
Something like (this is obviously incomplete C#, just a starting point)
class OptionData
{
public int ReqId { get; }
public double Strike { get; }
public string Expiry { get; }
public double? Bid { get; set; } = null;
public double? Ask { get; set; } = null;
public bool IsComplete()
{
return Bid != null && Ask != null;
}
public OptionData(int reqId, double strike, ....
{ ...
}
...
class MyData()
{
// Create somewhere to store our data, indexed by reqId.
Dictionary<int, OptionData> optChain = new();
public MyData()
{
// We would want to call reqSecDefOptParams to get a list of strikes etc.
// Choose which part of the chain you want, likely you'll want to
// get the current price of the underlying to decide.
int reqId = 1;
...
optChain.Add(++reqId, new OptionData(reqId,strike,expiry));
...
// Request data for each contract
// Note the 50 msg/sec limit https://interactivebrokers.github.io/tws-api/introduction.html#fifty_messages
// Only 1/sec for Reg snapshot
foreach(OptionData opt in optChain)
{
Contract con = new()
{
Symbol = "SPY",
Currency = "USD"
Exchange = "SMART",
Right = "C",
SecType = "OPT",
Strike = opt.strike,
Expiry = opt.Expiry
};
ibClient.ClientSocket.reqMktData(opt.ReqId, con, "", false, true, new List<TagValue>());
}
}
...
private void Recv_TickPrice(TickPriceMessage msg)
{
if(optChain.ContainsKey(msg.RequestId))
{
if (msg.Field == 2) optChain[msg.RequestId].Ask = msg.Price;
if (msg.Field == 1) optChain[msg.RequestId].Bid = msg.Price;
// You may want other tick types as well
// see https://interactivebrokers.github.io/tws-api/tick_types.html
if(optChain[msg.RequestId].IsComplete())
{
// This wont apply for reg snapshot.
ibClient.ClientSocket.cancelMktData(msg.RequestId);
// You have the data, and have cancelled the request.
// Maybe request more data or update display etc...
// Check if the whole chain is complete
bool complete=true;
foreach(OptionData opt in optChain)
if(!opt.IsComplete()) complete=false;
if(complete)
// do whatever
}
}
}

Mutiny - How to group items to send request by blocks

I'm using Mutiny extension (for Quarkus) and I don't know how to manage this problem.
I want to send many request in an async way so I've read about Mutiny extension. But the server closes the connection because it receives thousand of them.
So I need:
Send the request by blocks
After all request are sent, do things.
I've been using Uni object to combine all the responses as this:
Uni<Map<Integer, String>> uniAll = Uni.combine()
.all()
.unis(list)
.combinedWith(...);
And then:
uniAll.subscribe()
.with(...);
This code, send all the request in paralell so the server closes the connection.
I'm using group of Multi objects, but I don't know how to use it (in Mutiny docs I can't found any example).
This is the way I'm doing now:
//Launch 1000 request
for (int i=0;i<1000;i++) {
multi = client.getAbs("https://api.*********.io/jokes/random")
.as(BodyCodec.jsonObject())
.send()
.onItem().transformToMulti(
array -> Multi.createFrom()
.item(array.body().getString("value")))
.group()
.intoLists()
.of(100)
.subscribe()
.with(a->{
System.out.println("Value: "+a);
});
}
I think that the subscription doesn't execute until there are "100" groups of items, but I guess this is not the way because it doesn't work.
Does anybody know how to launch 1000 of async requests in blocks of 100?
Thanks in advance.
UPDATED 2021-04-19
I've tried with this approach:
List<Uni<String>> listOfUnis = new ArrayList<>();
for (int i=0;i<1000;i++) {
listOfUnis.add(client
.getAbs("https://api.*******.io/jokes/random")
.as(BodyCodec.jsonObject())
.send()
.onItem()
.transform(item -> item
.body()
.getString("value")));
}
Multi<Uni<String>> multiFormUnis = Multi.createFrom()
.iterable(listOfUnis);
List<String> listOfResponses = new ArrayList<>();
List<String> listOfValues = multiFormUnis.group()
.intoLists()
.of(100)
.onItem()
.transformToMultiAndConcatenate(listOfOneHundred ->
{
System.out.println("Size: "+listOfOneHundred.size());
for (int index=0;index<listOfOneHundred.size();index++) {
listOfResponses.add(listOfOneHundred.get(index)
.await()
.indefinitely());
}
return Multi.createFrom()
.iterable(listOfResponses);
})
.collectItems()
.asList()
.await()
.indefinitely();
for (String value : listOfValues) {
System.out.println(value);
}
When I put this line:
listOfResponses.add(listOfOneHundred.get(index)
.await()
.indefinitely());
The responses are printed one after each other, and when the first 100s group of items ends, it prints the next group. The problem? There are sequential requests and it takes so much time
I think I am close to the solution, but I need to know, how to send the parallel request only in group of 100s, because if I put:
subscribe().with()
All the request are sent in parallel (and not in group of 100s)
I think you create the multy wrong, it would be much easier to use this:
Multi<String> multiOfJokes = Multi.createFrom().emitter(multiEmitter -> {
for (int i=0;i<1000;i++) {
multiEmitter.emit(i);
}
multiEmitter.complete();
}).onItem().transformToUniAndMerge(index -> {
return Uni.createFrom().item("String" + index);
})
With this approach it should mace the call parallel.
Now is the question of how to make it to a list.
The grouping works fine
I run it with this code:
Random random = new Random();
Multi<Integer> multiOfInteger = Multi.createFrom().emitter(multiEmitter -> {
for (Integer i=0;i<1000;i++) {
multiEmitter.emit(i);
}
multiEmitter.complete();
});
Multi<String> multiOfJokes = multiOfInteger.onItem().transformToUniAndMerge(index -> {
if (index % 10 == 0 ) {
Duration delay = Duration.ofMillis(random.nextInt(100) + 1);
return Uni.createFrom().item("String " + index + " delayed").onItem()
.delayIt().by(delay);
}
return Uni.createFrom().item("String" + index);
}).onCompletion().invoke(() -> System.out.println("Completed"));
Multi<List<String>> multiListJokes = multiOfJokes
.group().intoLists().of(100)
.onCompletion().invoke(() -> System.out.println("Completed"))
.onItem().invoke(strings -> System.out.println(strings));
multiListJokes.collect().asList().await().indefinitely();
You will get a list of your string.
I don't know, how you intend to send the list to backend.
But you can either to it with:
call (executed asynchronously)
write own subscriber (implements Subscriber) the methods are straight forward.
As you need for your bulk request.
I hope you understand it better afterward.
PS: link to guide where I learned all of it:
https://smallrye.io/smallrye-mutiny/guides
So in short you want to batch parallel calls to the server, without hitting it with everything at once.
Could this work for you? It uses merge. In my example, it has a parallelism of 2.
Multi.createFrom().range(1, 10)
.onItem()
.transformToUni(integer -> {
return <<my long operation Uni>>
})
.merge(2) //this is the concurrency
.collect()
.asList();
I'm not sure if merge was added later this year, but this seems to do what you want. In my example, the "long operation producing Uni" is actually a call to the Microprofile Rest Client which produces a Uni, and returns a string. After the merge you can put another onItem to perform something with the response (it's a plain Multi after the merge), instead of collecting everything as list.

How to use R model in java to predict with multiple models?

I have this constructor:
public Revaluator(File model,PrintStream ps) {
modelFile=model;
rsession=Rsession.newInstanceTry(ps, null);
rsession.eval("library(e1071)");
rsession.load(modelFile);
}
i want to load a model and predict with it.
the problem that Rsession.newInstanceTry(ps, null); is always the same session, so if i load another model, like:
Revaluator re1=new Revaluator(new File("model1.RData"),System.out);
Revaluator re2=new Revaluator(new File("model2.RData"),System.out);
Both re1 and re2 using the same model, since the var name is model, so only the last one loaded.
the evaluate function:
public REXP evaluate(Object[] arr){
String expression=String.format("predict(model, c(%s))", J2Rarray(arr));
REXP ans=rsession.eval(expression);
return ans;
}
//J2Rarray just creates a string from the array like "1,2,true,'hello',false"
i need to load about 250 predictors, is there a way to get every instance of Rsession as a new separated R Session?
You haven't pasted all of your code in your question, so before trying the (complicated) way below, please rule out the simple causes and make sure that your fields modelFile and rsession are not declared static :-)
If they are not:
It seems that the way R sessions are created is OS dependent.
On Unix it relies on the multi-session ability of R itself, on Windows it starts with Port 6311 and checks if it is still free. If it's not, then the port is incremented and it checks again, if it's free and so on.
Maybe something goes wrong with checking free ports (which OS are you working on?).
You could try to configure the ports manually and explicitly start different local R servers like this:
Logger simpleLogger = new Logger() {
public void println(String string, Level level) {
if (level == Level.WARNING) {
p.print("! ");
} else if (level == Level.ERROR) {
p.print("!! ");
}
p.println(string);
}
public void close() {
p.close();
}
};
RserverConf serverConf = new RserverConf(null, staticPortCounter++, null, null, null);
Rdaemon server = new Rdaemon(serverConf, this);
server.start(null);
rsession = Rsession.newInstanceTry(serverConf);
If that does not work, please show more code of your Revaluator class and give details about which OS you are running on. Also, there should be several log outputs (at least if the log level is configured accordingly). Please paste the logged messages as well.
Maybe it could also help to get the source code of rsession from Google Code and use a debugger to set a breakpoint in Rsession.begin(). Maybe this can help figuring out what goes wrong.

Using camel to aggregate messages of same header

I have multiple clients that send files to a server. For one set of data there are two files that contain information about that data, each with the same name. When a file is received, the server sends a message out to my queue containing the file path, file name, ID of the client, and the "type" of file it is (all have same file extension but there are two "types," call them A and B).
The two files for one set of data have the same file name. As soon as the server has received both of the files I need to start a program that combines the two. Currently I have something that looks like this:
from("jms:queue.name").aggregate(header("CamelFileName")).completionSize(2).to("exec://FILEPATH?args=");
Where I am stuck is the header("CamelFileName"), and more specifically how the aggregator works.
With the completionSize set to 2 does it just suck up all the messages and store them in some data structure until a second message that matches the first comes through? Also, does the header() expect a specific value? I have multiple clients so I was thinking of having the client ID and the file name in the header, but then again I don't know if I have to give a specific value. I also don't know if I can use a regex or not.
Any ideas or tips would be super helpful.
Thanks
EDIT:
Here is some code I have now. Based on my description of the problem here and in comments on selected answer does it seem accurate (besides close brackets that I didn't copy over)?
public static void main(String args[]) throws Exception{
CamelContext c = new DefaultCamelContext();
c.addComponent("activemq", activeMQComponent("vm://localhost?broker.persistent=false"));
//ActiveMQConnectionFactory connectionFactory = new ActiveMQConnectionFactory("vm://localhost?broker.persistent=false");
//c.addComponent("jms", JmsComponent.jmsComponentAutoAcknowledge(connectionFactory));
c.addRoutes(new RouteBuilder() {
public void configure() {
from("activemq:queue:analytics.camelqueue").aggregate(new MyAggregationStrategy()).header("subject").completionSize(2).to("activemq:queue:analytics.success");
}
});
c.start();
while (true) {
System.out.println("Waiting on messages to come through for camel");
Thread.sleep(2 * 1000);
}
//c.stop();
}
private static class MyAggregationStrategy implements AggregationStrategy {
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
if (oldExchange == null)
return newExchange;
// and here is where combo stuff goes
String oldBody = oldExchange.getIn().getBody(String.class);
String newBody = newExchange.getIn().getBody(String.class);
boolean oldSet = oldBody.contains("set");
boolean newSet = newBody.contains("set");
boolean oldFlow = oldBody.contains("flow");
boolean newFlow = newBody.contains("flow");
if ( (oldSet && newFlow) || (oldFlow && newSet) ) {
//they match so return new exchange with info so extractor can be started with exec
String combined = oldBody + "\n" + newBody + "\n";
newExchange.getIn().setBody(combined);
return newExchange;
}
else {
// no match so do something....
return null;
}
}
}
you must supply an AggregationStrategy to define how you want to combine Exchanges...
if you are only interested in the fileName and receiving exactly 2 Exchanges, then you can just use the UseLatestAggregationStrategy to just pass the newest Exchange through once 2 have been 'aggregated'...
that said, it sounds like you need to retain both Exchanges (one for each clientId) so you can pass that info on to the 'exec' step...if so, you can just combine the Exchanges into a GroupedExchange holder using the built-in aggregation strategy enabled via the groupExchanges option...or specificy a custom AggregationStrategy to combine them however you'd like. just need to keep in mind that your 'exec' step needs to handle whatever aggregated structure you decide to use...
see these unit tests for examples:
https://svn.apache.org/repos/asf/camel/trunk/camel-core/src/test/java/org/apache/camel/processor/aggregator/AggregatorTest.java
https://svn.apache.org/repos/asf/camel/trunk/camel-core/src/test/java/org/apache/camel/processor/aggregator/AggregateGroupedExchangeTest.java

Categories