Solving for Scheduled Processing with Apache Flink - java

We have about 500 million drivers in 12 timezones. We send different communications such as their earnings report, new promotions, policy change updates etc., periodically. We want these communications to be delivered to them at a time that works best for them. For example - 9AM local time. We'd like to generate these communications early and publish them to Flink and schedule them for delivery based at appropriate times.
Messages will be in the following format -
{message_id, message, scheduled_time_in_utc}
For simplicity, we can assume that scheduled_time_in_utc is always 1 hour granularity. How do we go about achieving this functionality with Flink?
Any pointers are appreciated! Thanks!

I'm assuming you can push these communications in Kafka, which Flink would use as its source, and that you've got some async I/O method of sending out the communications (e.g. an SMS service). If so, then you could have a workflow that:
Reads the messages from Kakfa
Key by message_id
.process(new ReleaseTimedMessages())
Use Flink's AsyncIO support to send the message
Where the ReleaseTimedMessages KeyedProcessFunction would save the msg in state, and set up a process time timer to fire at the target delivery time. When it fires, emit the message and clear the timer.

Related

JMS with Oracle AQ. How to change delay of JMS Message when it's already in the queue?

If is it possible change delay of JMS Message when it's already in the queue?
When I send message via JMS, I set jmsMessage.setIntProperty("JMS_OracleDelay", 120). Then this jmsMessage appears in sql table operation_queue_table.
Can I update delay during these 120 seconds while the message is still in the queue?
I do this through an sql query
update operation_queue_table t set t.delay = ? where t.corrid = ?
But the message is still in the queue until 120 seconds have elapsed and only then disappears
As far as I know message mutability is not supported by any message provided, and certainly it is not supported by JMS api.
What you can do is to remove that specific message by consuming it with specific JMS selector and sending a new message with a same content and different parameters.
However, this will be horribly slow and it brings to question why are you using messaging at all. The general idea of asynchronous messaging is to fire and forget. If you do need updates don't use messaging. Given that this is Oracle AQ, maybe you could do it over JDBC? I never used that part of Oracle.

How to dinamically apply scheduled kafka consumer based on topic?

I'm currently struggling with a consumer on kafka that can somehow schedule to a future time for execution.
To summarize: I have a big data storage (.csv file) and the records contains 2 columns: timestamp and value. I'm trying to process this values based on their timestamp. First record it has to be consumed instantly by kafka, next one should be processed in future with a delay of 'current record timestamp - previous record timestamp' (it is not a very big difference, just a few seconds = result will be in millis) and so on.
So basically I can't find a solution to implement a consumer on kafka that takes each records based on timestamp and use that exact delay. I have to just simulate these values and they have to be insert in DB accordly to that delay to work properly.
I've tried to work around threads, with executors, but for big data it's not a properly way.
I tried to create dynamic topics on producers based on timestamp and then subscribe to them and then somehow process with a queue. It didn't work.
I expect the kafka to consume each record with the delay based on timestamp.
I expect the kafka to consume each record with the delay based on
timestamp
If you have specific delay between messages then Kafka is not a proper solution.
When you send messages to the Kafka, in most scenarios you use network. Which could add its own, unpredictable, delay. Kafka is running as a different process and nobody could guarantee at which moment this process will be ready to receive next message. OS could suspend process, GC could start etc. This adds another delay which nobody could predict.
Also, Kafka is not designed to operate with time when message was received. It more focused on order of messages, low latency and high throughput but not on timing.

Queueing a message in JMS, for delayed processing

I have a piece of middleware that sits between two JMS queues. From one it reads, processes some data into the database, and writes to the other.
Here is a small diagram to depict the design:
With that in mind, I have some interesting logic that I would like to integrate into the service.
Scenario 1: Say the middleware service receives a message from Queue 1, and hits the database to store portions of that message. If all goes well, it constructs a new message with some data, and writes it to Queue 2.
Scenario 2: Say that the database complains about something, when the service attempts to perform some logic after getting a message from Queue 1.In this case, instead of writing a message to Queue 2, I would re-try to perform the database functionality in incremental timeouts. i.e Try again in 5 sec., then 30 sec, then 1 minute if still down. The catch of course, is to be able to read other messages independently of this re-try. i.e Re-try to process this one request, while listening for other requests.
With that in mind, what is both the correct and most modern way to construct a future proof solution?
After reading some posts on the net, it seems that I have several options.
One, I could spin off a new thread once a new message is received, so that I can both perform the "re-try" functionality and listen to new requests.
Two, I could possibly send the message back to the Queue, with a delay. i.e If the process failed to execute in the db, write the message to the JMS queue by adding some amount of delay to it.
I am more fond of the first solution, however, I wanted to get the opinion of the community if there is a newer/better way to solve for this functionality in java 7. Is there something built into JMS to support this sort of "send message back for reprocessing at a specific time"?
JMS 2.0 specification describes the concept of delayed delivery of messages. See "What's new" section of https://java.net/projects/jms-spec/pages/JMS20FinalReleaseMany JMS providers have implemented the delayed delivery feature.
But I wonder how the delayed delivery will help your scenario. Since the database writes have issues, subsequent messages processing and attempt to write to database might end up in same situation. I guess it might be better to sort out issues with database updates and then pickup messages from queue.

Handling Failed calls on the Consumer end (in a Producer/Consumer Model)

Let me try explaining the situation:
There is a messaging system that we are going to incorporate which could either be a Queue or Topic (JMS terms).
1 ) Producer/Publisher : There is a service A. A produces messages and writes to a Queue/Topic
2 ) Consumer/Subscriber : There is a service B. B asynchronously reads messages from Queue/Topic. B then calls a web service and passes the message to it. The webservice takes significant amount of time to process the message. (This action need not be processed real-time.)
The Message Broker is Tibco
My intention is : Not to miss out processing any message from A. Re-process it at a later point in time in case the processing failed for the first time (perhaps as a batch).
Question:
I was thinking of writing the message to a DB before making a webservice call. If the call succeeds, I would mark the message processed. Otherwise failed. Later, in a cron job, I would process all the requests that had initially failed.
Is writing to a DB a typical way of doing this?
Since you have a fail callback, you can just requeue your Message and have your Consumer/Subscriber pick it up and try again. If it failed because of some problem in the web service and you want to wait X time before trying again then you can do either schedule for the web service to be called at a later date for that specific Message (look into ScheduledExecutorService) or do as you described and use a cron job with some database entries.
If you only want it to try again once per message, then keep an internal counter either with the Message or within a Map<Message, Integer> as a counter for each Message.
Crudely put that is the technique, although there could be out-of-the-box solutions available which you can use. Typical ESB solutions support reliable messaging. Have a look at MuleESB or Apache ActiveMQ as well.
It might be interesting to take advantage of the EMS platform your already have (example 1) instead of building a custom solution (example 2).
But it all depends on the implementation language:
Example 1 - EMS is the "keeper" : If I were to solve such problem with TIBCO BusinessWorks, I would use the "JMS transaction" feature of BW. By encompassing the EMS read and the WS call within the same "group", you ask for them to be both applied, or not at all. If the call failed for some reason, the message would be returned to EMS.
Two problems with this solution : You might not have BW, and the first failed operation would block all the rest of the batch process (that may be the desired behavior).
FYI, I understand it is possible to use such feature in "pure java", but I never tried it : http://www.javaworld.com/javaworld/jw-02-2002/jw-0315-jms.html
Example 2 - A DB is the "keeper" : If you go with your "DB" method, your queue/topic customer continuously drops insert data in a DB, and all records represent a task to be executed. This feels an awful lot like the simple "mapping engine" problem every integration middleware aims to make easier. You could solve this with anything from a custom java code and multiples threads (DB inserter, WS job handlers, etc.) to an EAI middleware (like BW) or even a BPM engine (TIBCO has many solutions for that)
Of course, there are also other vendors... EMS is a JMS standard implementation, as you know.
I would recommend using the built in EMS (& JMS) features,as "guaranteed delivery" is what it's built for ;) - no db needed at all...
You need to be aware that the first decision will be:
do you need to deliver in order? (then only 1 JMS Session and Client Ack mode should be used)
how often and in what reoccuring times do you want to retry? (To not make an infinite loop of a message that couldn't be processed by that web service).
This is independent whatever kind of client you use (TIBCO BW or e.g. Java onMessage() in a MDB).
For "in order" delivery: make shure only 1 JMS Session processes the messages and it uses Client acknolwedge mode. After you process the message sucessfully, you need to acknowledge the message with either calling the JMS API "acknowledge()" method or in TIBCO BW by executing the "commit" activity.
In case of an error you don't execute the acknowledge for the method, so the message will be put back in the Queue for redelivery (you can see how many times it was redelivered in the JMS header).
EMS's Explicit Client Acknolwedge mode also enables you to do the same if order is not important and you need a few client threads to process the message.
For controlling how often the message get's processed use:
max redelivery properties of the EMS queue (e.g. you could put the message in the dead
letter queue afer x redelivery to not hold up other messages)
redelivery delay to put a "pause" in between redelivery. This is useful in case the
Web Service needs to recover after a crash and not gets stormed by the same message again and again in high intervall through redelivery.
Hope that helps
Cheers
Seb

Generic QoS Message batching and compression in Java

We have a custom messaging system written in Java, and I want to implement a basic batching/compression feature that basically under heavy load it will aggregate a bunch of push responses into a single push response.
Essentially:
if we detect 3 messages were sent in the past second then start batching responses and schedule a timer to fire in 5 seconds
The timer will aggregate all the message responses received in the next 5 seconds into a single message
I'm sure this has been implemented before I'm just looking for the best example of it in Java. I'm not looking for a full blown messaging layer, just the basic detect messages per second and schedule some task (obviously I can easily write this myself I just want to compare it with any existing algorithms to make sure I'm not missing any edge cases or that I've simplified the problem as much as possible).
Are there any good open source examples of building a basic QoS batching/throttling/compression implementations?
we are using a very similar mechanism for high load.
it will work as you described it
* Aggregate messages over a given interval
* Send a List instead of a single message after that.
* Start aggregating again.
You should watch out for the following pitfalls:
* If you are using a transacted messaging system like JMS you can get into trouble because your implementation will not be able to send inside the JMS transaction so it will keep aggregating. Depending on the size of your data structure to hold the messages this can run out of space. If you are have very long transactions sending many messages this can pose a problem.
* Sending a message in such a way will happen asynchronous because a different thread will be sending the message and the thread calling the send() method will only put it in the data structure.
* Sticking to the JMS example you should keep in mind that they way messages are consumed is also changed by this approach. Because you will receive the list of messages from JMS as a single message. So once you commit this single JMS message you commited the entire list of messages. You should check if this i a problem to your requirements.

Categories