is there a way that that a component in the spring integration graph will start processing a message (end of day) first when all other messages (other type) finished processing? in this question we have to consider that spring integrations can start multiple threads. An other restriction is that this component will be used in graphs where i have no control. so i can not tell:
how long "other type" message processing takes
if some messages run in error
are just drop by some filter
multiplied with a publish-subscribe channel
if there are some TaskExecutor used (introduces new thread and transaction boundary)
there is no end artifact which i could check if it is there or not
when "end of day" arrives to my component it is possible that "other type" messages are still in processing. even if my component is at the end of the graph it is possible that messages run it error not arrive there. other posibility that a message is mutiplied and i do not know how many times. because of this i do not know how long i should wait with the "end o day" processing.
it is also possible that an other tool/framework would make this problem easier or eliminate it completely.
was tinking about checking the task executor if all threads are free, but there might be several task executors some of which not involved.
This sounds like not related to what you say in the question, when you claim that process you use is a black box.
If you really can have access somehow to the process, e.g. via ChannelInterceptor: https://docs.spring.io/spring-integration/docs/current/reference/html/core.html#channel-interceptors, so you can have some global bean like AtomicBoolean active to set in the beginning of the flow and reset in the end. So, your "in the end of day" message would poll this flag periodically to be sure when to send it. You just simply can use a #InboundChannelAdapter to produce your message or null when flag is false.
Related
I have very huge camel pipeline which starts from consuming message from SQS.
The time requires for whole process is vary. From 5 sec to 30 min, it is hard to guess here.
What I want to achieve:
Do not guess visibility timeout size, and just delete message from SQS as soon as message is consumed.
What I already tried:
Tried Camel option deleteAfterRead=true -> doesn't help, because as stated in the doc: Delete message from SQS after it has been read (and processed by the route). And I have huge pipeline. So processed requirement fails here.
Tried to increase visibility timeout, but as I stated, it is just a guessing game, and I want to develop more reliable solution.
Thank you for help!
You should certainly NOT delete the message as soon as you consume it if you are not finished processing it because you will lose the message if your application crashes.
What you should do instead is extend the visibility timeout manually if you realize that your processing time gets close to the original visibility timeout. Not sure how you would implement it in Camel but we previously integrated into the default springframework.cloud.aws.messaging: https://github.com/Mercateo/sqs-utils
If, like the OP has stated, you don't mind about losing the messages in case e.g. of a server crash, there is a workaround consisting in sending the message to a wireTap and doing your processing there. This allows the main route to end early and a deleteMessage to be sent to the queue.
Another interesting, realted situation is that of an exception: if you have an exception handler in place and mark the exception as handled calling handled(true) on the OnExceptionDefinition, your message will be deleted too.
Im currently facing the problem that i want to realize a simple Master-Slave pattern, where the master initializes a job queue by publishing all jobs from the beginning to a topic. The slaves would pull those jobs everytime they have free working capabilities, pulling would be realized by pulling one job at a time. The code from the example code on github pulls multiple messages for a specific time
subscriber.startAsync().awaitRunning();
Thread.sleep(params.y());
I dont want that, i just want to pull one job message from the queue, let the slave do the work and after the work is done, call the pulling method to pull another job message, but just one at a time. Since I'm executing the jobs in an ExecutorService i want to ensure that i don't pull any messages, if my thread pool is filled. How would i realize pulling one message, fill that job into my ExecutorService and only pull the next job message, if there is a job finished, and a thread without work?
Pulling a single message at a time would be considered an anti-pattern for Google Cloud Pub/Sub. You can control the number of messages delivered to your worker by specifying FlowControlSettings via the Subscriber Builder. In particular, you could call setMaxOutstandingElementCount on the FlowControlSettings Builder to limit the maximum number of messages that have been delivered to the MessageReceiver you provided. If each of your workers is individually a subscriber and wants to perform a single action at a time, you could even set this number to 1.
If you need more exact control over the pull semantics for your subscriber, then you can use the gRPC library's pull method directly. The Serivce APIs Overview has more information on this approach.
Background
At a high level, I have a Java application in which certain events should trigger a certain action to be taken for the current user. However, the events may be very frequent, and the action is always the same. So when the first event happens, I would like to schedule the action for some point in the near future (e.g. 5 minutes). During that window of time, subsequent events should take no action, because the application sees that there's already an action scheduled. Once the scheduled action executes, we're back to Step 1 and the next event starts the cycle over again.
My thought is to implement this filtering and throttling mechanism by embedding an in-memory ActiveMQ instance within the application itself (I don't care about queue persistence).
I believe that JMS 2.0 supports this concept of delayed delivery, with delayed messages sitting in a "staging queue" until it's time for delivery to the real destination. However, I also believe that ActiveMQ does not yet support the JMS 2.0 spec... so I'm thinking about mimicking the same behavior with time-to-live (TTL) values and Dead Letter Queue (DLQ) handling.
Basically, my message producer code would put messages on a dummy staging queue from which no consumers ever pull anything. Messages would be placed with a 5-minute TTL value, and upon expiration ActiveMQ would dump them into a DLQ. That's the queue from which my message consumers would actually consume the messages.
Question
I don't think I want to actually consume from the "default" DLQ, because I have no idea what other internal things ActiveMQ might dump there that are completely unrelated to my application code. So I think it would be best for my dummy staging queue to have its own custom DLQ. I've only seen one page of ActiveMQ documentation which discusses DLQ config, and it only addresses XML config files for a standalone ActiveMQ installation (not an in-memory broker embedded within an app).
Is it possible to programmatically configure a custom DLQ at runtime for a queue in an embedded ActiveMQ instance?
I'd also be interested to hear alternative suggestions if you think I'm on the wrong track. I'm much more familiar with JMS than AMQP, so I don't know if this is much easier with Qpid or some other Java-embeddable AMQP broker. Whatever Apache Camel actually is (!), I believe it's supposed to excel at this sort of thing, but that learning curve might be gross overkill for this use case.
Although you're worried that Camel might be gross overkill for this usecase, I think that ActiveMQ is already gross overkill for the usecase you've described.
You're looking to schedule something to happen 5 minutes after an event happens, and for it to consume only the first event and ignore all the ones between the first one and when the 5 minutes are up, right? Why not just schedule your processing method for 5 minutes from now via ScheduledExecutorService or your favorite scheduling mechanism, and save the event in a HashMap<User, Event> member variable. If any more events come in for this user before the processing method fires, you'll just see that you already have an event stored and not store the new one, so you'll ignore all but the first. At the end of your processing method, delete the event for this user from your HashMap, and the next event to come in will be stored and scheduled.
Running ActiveMQ just to get this behavior seems like way more than you need. Or if not, can you explain why?
EDIT:
If you do go down this path, don't use the message TTL to expire your messages; just have the (one and only) consumer read them into memory and use the in-memory solution described above to only process (at most) one batch every 5 minutes. Either have a single queue with message selectors, or use dynamic queues, one per user. You don't need the DLQ to implement the delay, and even if you could get it to do that, it won't give you the functionality of batching everything so you only run once per 5 minutes. This isn't a path you want to go down, even if you figure out how.
A simple solution is keeping track of the pending actions in a concurrent structure and use a ScheduledExecutorService to execute them:
private static final Object RUNNING = new Object();
private final ConcurrentMap<UserId, Object> pendingActions =
new ConcurrentHashMap<>();
private ScheduledExecutorService ses = Executors.newScheduledThreadPool(10);
public void takeAction(final UserId id) {
Object running = pendingActions.putIfAbsent(id, RUNNING); // atomic
if(running == null) { // no pending action for this user
ses.schedule(new Runnable() {
#Override
public void run() {
doWork();
pendingActions.remove(id);
}
}, 5, TimeUnit.MINUTES);
}
}
With Camel this could be easily achieved with an Aggregator component with the parameter completionInterval , so on every five minutes you can check if the list aggregated messages is empty, if it's not fire a message to the route responsible for you user action and empty the list. You do need to maintain the whole list of exchanges, just the state (user action planned or not).
Let me try explaining the situation:
There is a messaging system that we are going to incorporate which could either be a Queue or Topic (JMS terms).
1 ) Producer/Publisher : There is a service A. A produces messages and writes to a Queue/Topic
2 ) Consumer/Subscriber : There is a service B. B asynchronously reads messages from Queue/Topic. B then calls a web service and passes the message to it. The webservice takes significant amount of time to process the message. (This action need not be processed real-time.)
The Message Broker is Tibco
My intention is : Not to miss out processing any message from A. Re-process it at a later point in time in case the processing failed for the first time (perhaps as a batch).
Question:
I was thinking of writing the message to a DB before making a webservice call. If the call succeeds, I would mark the message processed. Otherwise failed. Later, in a cron job, I would process all the requests that had initially failed.
Is writing to a DB a typical way of doing this?
Since you have a fail callback, you can just requeue your Message and have your Consumer/Subscriber pick it up and try again. If it failed because of some problem in the web service and you want to wait X time before trying again then you can do either schedule for the web service to be called at a later date for that specific Message (look into ScheduledExecutorService) or do as you described and use a cron job with some database entries.
If you only want it to try again once per message, then keep an internal counter either with the Message or within a Map<Message, Integer> as a counter for each Message.
Crudely put that is the technique, although there could be out-of-the-box solutions available which you can use. Typical ESB solutions support reliable messaging. Have a look at MuleESB or Apache ActiveMQ as well.
It might be interesting to take advantage of the EMS platform your already have (example 1) instead of building a custom solution (example 2).
But it all depends on the implementation language:
Example 1 - EMS is the "keeper" : If I were to solve such problem with TIBCO BusinessWorks, I would use the "JMS transaction" feature of BW. By encompassing the EMS read and the WS call within the same "group", you ask for them to be both applied, or not at all. If the call failed for some reason, the message would be returned to EMS.
Two problems with this solution : You might not have BW, and the first failed operation would block all the rest of the batch process (that may be the desired behavior).
FYI, I understand it is possible to use such feature in "pure java", but I never tried it : http://www.javaworld.com/javaworld/jw-02-2002/jw-0315-jms.html
Example 2 - A DB is the "keeper" : If you go with your "DB" method, your queue/topic customer continuously drops insert data in a DB, and all records represent a task to be executed. This feels an awful lot like the simple "mapping engine" problem every integration middleware aims to make easier. You could solve this with anything from a custom java code and multiples threads (DB inserter, WS job handlers, etc.) to an EAI middleware (like BW) or even a BPM engine (TIBCO has many solutions for that)
Of course, there are also other vendors... EMS is a JMS standard implementation, as you know.
I would recommend using the built in EMS (& JMS) features,as "guaranteed delivery" is what it's built for ;) - no db needed at all...
You need to be aware that the first decision will be:
do you need to deliver in order? (then only 1 JMS Session and Client Ack mode should be used)
how often and in what reoccuring times do you want to retry? (To not make an infinite loop of a message that couldn't be processed by that web service).
This is independent whatever kind of client you use (TIBCO BW or e.g. Java onMessage() in a MDB).
For "in order" delivery: make shure only 1 JMS Session processes the messages and it uses Client acknolwedge mode. After you process the message sucessfully, you need to acknowledge the message with either calling the JMS API "acknowledge()" method or in TIBCO BW by executing the "commit" activity.
In case of an error you don't execute the acknowledge for the method, so the message will be put back in the Queue for redelivery (you can see how many times it was redelivered in the JMS header).
EMS's Explicit Client Acknolwedge mode also enables you to do the same if order is not important and you need a few client threads to process the message.
For controlling how often the message get's processed use:
max redelivery properties of the EMS queue (e.g. you could put the message in the dead
letter queue afer x redelivery to not hold up other messages)
redelivery delay to put a "pause" in between redelivery. This is useful in case the
Web Service needs to recover after a crash and not gets stormed by the same message again and again in high intervall through redelivery.
Hope that helps
Cheers
Seb
I am currently developing a system that uses allot of async processing. The transfer of information is done using Queues. So one process will put info in the Queue (and terminate) and another will pick it up and process it. My implementation leaves me facing a number of challenges and I am interested in what everyone's approach is to these problems (in terms of architecture as well as libraries).
Let me paint the picture. Lets say you have three processes:
Process A -----> Process B
|
Process C <-----------|
So Process A puts a message in a queue and ends, Process B picks up the message, processes it and puts it in a "return" queue. Process C picks up the message and processes it.
How does one handle Process B not listening or processing messages off the Queue? Is there some JMS type method that prevents a Producer from submitting a message when the Consumer is not active? So Process A will submit but throw an exception.
Lets say Process C has to get a reply with in X minutes, but Process B has stopped (for any reason), is there some mechanism that enforces a timeout on a Queue? So guaranteed reply within X minutes which would kick off Process C.
Can all of these matters be handled using a dead letter Queue of some sort? Should I maybe be doing this all manually with timers and check. I have mentioned JMS but I am open to anything, in fact I am using Hazelcast for the Queues.
Please note this is more of a architectural question, in terms of available java technologies and methods, and I do feel this is a proper question.
Any suggestions will be greatly appreciated.
Thanks
IMHO, The simplest solution is to use an ExecutorService, or a solution based on an executor service. This supports a queue of work, scheduled tasks (for timeouts).
It can also work in a single process. (I believe Hazelcast supports distributed ExecutorService)
It seems to me that the type of questions you're asking are "smells" that queues and async processing may not be the best tools for your situation.
1) That defeats a purpose of a queue. Sounds like you need a synchronous request-response process.
2) Process C is not getting a reply generally speaking. It's getting a message from a queue. If there is a message in the queue and the Process C is ready then it will get it. Process C could decide that the message is stale once it gets it, for example.
I think your first question has already been answered adequately by the other posters.
On your second question, what you are trying to do may be possible depending on the messaging engine used by your application. I know this works with IBM MQ. I have seen this being done using the WebSphere MQ Classes for Java but not JMS. The way it works is that when Process A puts a message on a queue, it specifies the time it will wait for a response message. If Process A fails to receive a response message within the specified time, the system throws an appropriate exception.
I do not think there is a standard way in JMS to handle request/response timeouts the way you want so you may have to use platform specific classes like WebSphere MQ Classes for Java.
Well, kind of the point of queues is to keep things pretty isolated.
If you're not stuck on any particular tech, you could use a database for your queues.
But first, a simple mechanism to ensure two processes are coordinated is to use a socket. If practical, simply have process B create an open socket listener on some well know port, and process A will connect to that socket, and monitor it. If process B ever goes away, process A can tell because their socket gets shutdown, and it can use that as an alert of problems with process B.
For the B -> C problem, have a db table:
create table queue (
id integer,
payload varchar(100), // or whatever you can use to indicate a payload
status varchar(1),
updated timestamp
)
Then, Process A puts its entry on the queue, with the current time and a status of "B". B, listens on the queue:
select * from queue where status = 'B' order by updated
When B is done, it updates the queue to set the status to "C".
Meanwhile, "C" is polling the DB with:
select * from queue where status = 'C'
or (status = 'B' and updated < (now - threshold) order by updated
(with the threshold being however long you want things to rot on the queue).
Finally, C updates the queue row to 'D' for done, or deletes it, or whatever you like.
The dark side is there is a bit of a race condition here where C might try and grab an entry while B is just starting up. You can probably get through that with a strict isolation level, and some locking. Something as simply as:
select * from queue where status = 'C'
or (status = 'B' and updated < (now - threshold) order by updated
FOR UPDATE
Also use FOR UPDATE for B's select. This way whoever win the select race will get an exclusive lock on the row.
This will get you pretty far down the road in terms of actual functionality.
You are expecting the semantics of synchronous processing with async (messaging) setup which is not possible. I have worked on WebSphere MQ and normally when the consumer dies, the messages are kept in the queue forever (unless you set the expiry). Once the queue reaches its depth, the subsequent messages are moved to the dead letter queue.
I've used a similar approach to create a queuing and processing system for video transcoding jobs. Basically the way it worked was:
Process A posts a "schedule" message to Arbiter Q, which adds the job into its "waiting" queue.
Process B requests the next job from Arbiter Q, which removes the next item in its "waiting" queue (subject to some custom scheduling logic to ensure that a single user couldn't flood transcode requests and prevent other users from being able to transcode videos) and inserts it into its "processing" set before returning the job back to Process B. The job is timestamped when it goes into the "processing" set.
Process B completes the job and posts a "complete" message to Arbiter Q, which removes the job from the "processing" set and then modifies some state so that Process C knows the job completed.
Arbiter Q periodically inspects the jobs in its "processing" set, and times out any that have been running for an unusually long amount of time. Process A is then free to attempt to queue up the same job again, if it wants.
This was implemented using JMX (JMS would have been much more appropriate, but I digress). Process A was simply the servlet thread which responded to a user-initiated transcode request. Arbiter Q was an MBean singleton (persisted/replicated across all the nodes in a cluster of servers) that received "schedule" and "complete" messages. Its internally managed "queues" were simply List instances, and when a job completed it modified a value in the application's database to refer to the URL of the transcoded video file. Process B was the transcoding thread. Its job was simply to request a job, transcode it, and then report back when it finished. Over and over again until the end of time. Process C was another user/servlet thread. It would see that the URL was available, and present the download link to the user.
In such a case, if Process B were to die then the jobs would sit in the "waiting" queue forever. In practice, however, that never happened. If your Process B is not running/doing what it is supposed to do then I think that suggests a problem in your deployment/configuration/implementation of Process B more than it does a problem in your overall approach.