Server to handle time delayed events - java

I'm searching materials/ideas/designs to solve architecture problem:
I'll have several agents which handle some processing, as a result they can generate state for clients which will expire after some time. Let's say client sent presence state which expire after 1h. I wondering how to write service to keep track of expiration time of scheduled events.
1) create sorted collection with timestamps and process it by some executor
2) put all into DB and perform cyclic check using sorted query
Any suggestions are appreciated.

If you are using spring framework, you can use Spring cron http://docs.spring.io/spring/docs/3.0.x/spring-framework-reference/html/scheduling.html

Related

How to delay the execution of a timed task for two or more years

In a microservice architecture, suppose there is a business scenario where a user purchases something that will expire after two years, and the system needs to notify the user a little bit in advance.
In this case, how should we handle the situation so that the users can be notified on time even if there are many users who need to be notified?
For example, using a delayed queue of message queue will cause the messages to pile up when there are many users; using a timed task, too many users will overload the server CPU.
Is there a good way to do this?
While "microservices" do not inherently mean "REST", they usually are. And in REST you shouldn't store in memory anything that needs to survive more than one request. Two years is an extreme case, but even if it is for just 10 minutes, it should probably go to the DB.
Building up a queue for two years will just be very impractical and likely to fail if the queue contents are not persisted somewhere. Since you mention purchases I am assuming you have some sort of data store to record them either in sql or no-sql.
You can simply add purchase date/time column(s) to the table to make life easier. If you volumes are low enough for daily purchases then I would start with date based lookup only. You will need a scheduled execution of some service method say at 6am everyday that looks up purchases close to expiry i.e 7 days before 2 years purchase_date = now - 723days and then send rest request somewhere or publishes an event or jms message with order number and purchase_date as content for each purchase order. This will then be picked up by event/message listener somewhere and processed accordingly i.e. send a notification to customer. To avoid sending duplicate notifications you should also persist the expiry notifications in a database and ensure you check that notification has been sent for purchase id before sending it again.
If you ever reach a situation where you are processing thousands of orders a day and don't want to publish large number of events in one go then extend the functionality to filter by purchase timestamp and process chunks of purchases multiple times a day by changing the lookup condition.
This is just general idea of such requirement and you will have to fine-grain a lot of implementation details such as what happens if your email server is down.
You can use quartz job and configure it to use persistent mode in database (JDBC JobStore) to not loose information and also it is suitable for clustering mode.
Quartz checks periodically the database for the nearest task (configurable parameter) if the time comes, it will process the notification.
You can configure the thread pool size in order to avoid overload.

Replacing a scheduled task with Spring Events

In my Spring Boot app, customers can submit files. Each customer's files are merged together by a scheduled task that runs every minute. The fact that the merging is performed by a scheduler has a number of drawbacks, e.g. it's difficult to write end-to-end tests, because in the test you have to wait for the scheduler to run before retrieving the result of the merge.
Because of this, I would like to use an event-based approach instead, i.e.
Customer submits a file
An event is published that contains this customer's ID
The merging service listens for these events and performs a merge operation for the customer in the event object
This would have the advantage of triggering the merge operation immediately after there is a file available to merge.
However, there are a number of problems with this approach which I would like some help with
Concurrency
The merging is a reasonably expensive operation. It can take up to 20 seconds, depending on how many files are involved. Therefore the merging will have to happen asynchronously, i.e. not as part of the same thread which publishes the merge event. Also, I don't want to perform multiple merge operations for the same customer concurrently in order to avoid the following scenario
Customer1 saves file2 triggering a merge operation2 for file1 and file2
A very short time later, customer1 saves file3 triggering merge operation3 for file1, file2, and file3
Merge operation3 completes saving merge-file3
Merge operation2 completes overwriting merge-file3 with merge-file2
To avoid this, I plan to process merge operations for the same customer in sequence using locks in the event listener, e.g.
#Component
public class MergeEventListener implements ApplicationListener<MergeEvent> {
private final ConcurrentMap<String, Lock> customerLocks = new ConcurrentHashMap<>();
#Override
public void onApplicationEvent(MergeEvent event) {
var customerId = event.getCustomerId();
var customerLock = customerLocks.computeIfAbsent(customerId, key -> new ReentrantLock());
customerLock.lock();
mergeFileForCustomer(customerId);
customerLock.unlock();
}
private void mergeFileForCustomer(String customerId) {
// implementation omitted
}
}
Fault-Tolerance
How do I recover if for example the application shuts down in the middle of a merge operation or an error occurs during a merge operation?
One of the advantages of the scheduled approach is that it contains an implicit retry mechanism, because every time it runs it looks for customers with unmerged files.
Summary
I suspect my proposed solution may be re-implementing (badly) an existing technology for this type of problem, e.g. JMS. Is my proposed solution advisable, or should I use something like JMS instead? The application is hosted on Azure, so I can use any services it offers.
If my solution is advisable, how should I deal with fault-tolerance?
Regarding the concurrency part, I think the approach with locks would work fine, if the number of files submitted per customer (on a given timeframe) is small enough.
You can eventually monitor over time the number of threads waiting for the lock to see if there is a lot of contention. If there is, then maybe you can accumulate a number of merge events (on a specific timeframe) and then run a parallel merge operation, which in fact leads to a solution similar to the one with the scheduler.
In terms of fault-tolerance, an approach based on a message queue would work (haven't worked with JMS but I see it's an implementation of a message-queue).
I would go with a cloud-based message queue (SQS for example) simply because of reliability purposes. The approach would be:
Push merge events into the queue
The merging service scans one event at a time and it starts the merge job
When the merge job is finished, the message is removed from the queue
That way, if something goes wrong during the merge process, the message stays in the queue and it will be read again when the app is restarted.
My thoughts around this matter after some considerations.
I restricted possible solutions to what's available from Azure managed services, according to specifications from OP.
Azure Blob Storage Function Trigger
Because this issue is about storing files, let's start with exploring Blob Storage with trigger function that fires on file creation. According to doc, Azure functions can run up to 230 seconds, and will have a default retry count of 5.
But, this solution will require that files from a single customer arrives in a manner that will not cause concurrency issues, hence let's leave this solution for now.
Azure Queue Storage
Does not guarantee first-in-first-out (FIFO) ordered delivery, hence it does not meet the requirements.
Storage queues and Service Bus queues - compared and contrasted: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted
Azure Service Bus
Azure Service Bus is a FIFO queue, and seems to meet the requirements.
https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted#compare-storage-queues-and-service-bus-queues
From doc above, we see that large files are not suited as message payload. To solve this, files may be stored in Azure Blob Storage, and message will contain info where to find the file.
With Azure Service Bus and Azure Blob Storage selected, let's discuss implementation caveats.
Queue Producer
On AWS, the solution for the producer side would have been like this:
Dedicated end-point provides pre-signed URL to customer app
Customer app uploads file to S3
Lambda triggered by S3 object creation inserts message to queue
Unfortunately, Azure doesn't have a pre-signed URL equivalent yet (they have Shared Access Signature which is not equal), hence file uploads must be done through an end-point which in turn stores the file to Azure Blob Storage. When file upload end-point is required, it seems appropriate to let the file upload end-point also be reponsible for inserting messages into queue.
Queue Consumer
Because file merging takes a signicant amount of time (~ 20 secs), it should be possible to scale out the consumer side. With multiple consumers, we'll have to make sure that a single customer is processed by no more than one consumer instance.
This can be solved by using message sessions: https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-sessions
In order to achieve fault tolerance, consumer should use peek-lock (as opposed to receive-and-delete) during file merge and mark message as completed when file merge is completed. When message is marked as completed, consumer may be responsible for
removing superfluous files in Blob Storage.
Possible problems with both existing solution and future solution
If customer A starts uploading a huge file #1 and immediately after that starts uploading a small file #2, file upload of file #2 may be be completed before file #1 and cause an out-of-order situation.
I assume that this is an issue that is solved in existing solution by using some kind of locking mechanism or file name convention.
Spring-boot with Kafka can solve your problem of fault tolerance.
Kafka supports the producer-consumer model. let the customer events posted to Kafka producer.
configure Kafka with replication for not to lose any events.
use consumers that can invoke the Merging service for each event.
once the consumer read the event of customerId and merged then commit the offset.
In case of any failure in between merging the event, offset is not committed so it can be read again when the application started again.
If the merging service can detect the duplicate event with given data then reprocessing the same message should not cause any issue(Kafka promises single delivery of the event). Duplicate event detection is a safety check for an event processed full but failed to commit to Kafka.
First, event-based approach is corrrect for this scenario. You should use external broker for pub-sub event messages.
Attention that, by default, Spring publishing an event is synchronous.
Suppose that, you have 3 services:
App Service
Merge Servcie
CDC Service (change data capture)
Broker Service (Kafka, RabbitMQ,...)
Main flow base on "Outbox Pattern":
App Service save event message to Outbox message table
CDC Service watching outbox table and publish event message from Outbox table to Broker Servie
Merge Service subcribe to Broker Server and receiving event message (messages is orderly)
Merge Servcie perform merge action
You can use eventuate lib for this flow.
Futher more, you can apply DDD to your architecture. Using Axon framework for CQRS pattern, public domain event and process it.
Refer to:
Outbox pattern: https://microservices.io/patterns/data/transactional-outbox.html
It really sounds like you may do with a Stream or an ETL tool for the job. When you are developing an app, and you have some prioritisation/queuing/batching requirement, it is easy to see how you can build a solution with a Cron + SQL Database, with maybe a queue to decouple doing work from producing work.
This may very well be the easiest thing to build as you have a lot of granularity and control to this approach. If you believe that you can in fact meet your requirements this way fairly quickly with low risk, you can do so.
There are software components which are more tailored to these tasks, but they do have some learning curves, and depend on what PAAS or cloud you may be using. You'll get monitoring, scalability, availability resiliency out-of-the-box. An open source or cloud service will take the burden of management off your hands.
What to use will also depend on what your priority and requirements are. If you want to go the ETL approach which is great at banking up jobs you might want to use something like a Glue t. If you want to want prioritization functionality you may want to use multiple queues, it really depends. You'll also want to monitor with a dashboard to see what wait time you should have for your merge regardless of the approach.

long processing jobs in a java web app

What is the best way to perform long tasks (triggered by a user and for that user only) in a java web app? I tried using ejb #Asynchronous and jax-ws asynchronous (polling) calls but the Future<?> they are returning is not serializable and could not be stored in HttpSession (to retrieve the result later when it's done). Is there a simple way to use concurrent Future<?> in a java web environment or do I have to go with a full-blown jobs management framework?
Best solution so far was to use an application-scoped Map<SessionId, List<Future<?>>>. This works in cluster with sticky sessions and does not need to use JMS queues and storage of result in database.
The best is to use JMS . Implement a messaging solution which is asynchronous which sends a message to a queue/topic where a MDB listens to that queue / topic to be triggered upon the message arrival to perform the long task in an offline manner
http://www.javablogging.com/simple-guide-to-java-message-service-jms-using-activemq/
http://docs.oracle.com/javaee/1.3/jms/tutorial/
If your process is supposed to generate a result and you are expecting the process to take a long time, probably the best way is two have 2 separate calls:
First one to trigger the process and which return a unique process identifier
Second one to retrieve the result using the process identifier
So your overall process flow will be:
Client call back end service.
Back end service starts async process with unique id and return the unique id to client right away.
Async process persist the result in session or other more persistent mechanism (db, file, etc.)
Client side poll server with unique id
Retrieval method return result when exist, otherwise return not done message

Boolean flag available over multiple instances in App Engine

I have an app that runs over several instances and all requests come through one servlet.
I need to run a cron job which executes once a week for about 3 minutes. During that cron call some kind of flag/boolean will be modified somewhere so that the servlet can pick up and send an "server temporarily unavailable" type message back instead of processing the request. Once the cron job is complete it will flag it back to true.
I cannot use a singleton or a static boolean as the app will be in multiple instances. Nor do I want the servlet to have to fetch a value from the datastore on every request, as it will mean hundreds of thousands of extra datastore reads.
What can I do? Any ideas?
I think you may be able to store boolean in memcached. GAE has a Cache API for Memcached. However note that cache values are not persistent and may not be survived for even 3 minutes. I think you should have a firm time to start cron task hardcoded in one of your Java classes or .properties file and then when your task finishes, it should look at that hard-coded time and schedule itself for next round according to that time.
And by this way your servlet can also look at that time and do not serve requests in the interval you are going to specify. Yeah, that will be very fast but your jobs will be scheduled to a fixed time periodically and you won't be able to change this unless you re-deploy application.
I think the better solution is you should keep the boolean in the datastore and make use of cache. See the following algorithm:
is my boolean in the cache?
yes:
[alright, then choose to serve or not to serve request using it.]
no:
[fetch variable from datastore and put it on the cache.] (cache miss)
Again, cache will be fast, but not as much as hard-coding the schedule in the program.
EDIT: Another solution. (however not possible to implement)
If you want to serving pages during the task execution, you should use a task api
First of all you should be familiar with using countdown for your task (in this case next week) http://code.google.com/appengine/docs/java/javadoc/com/google/appengine/api/taskqueue/TaskOptions.html#countdownMillis(long)
Then you can use size() method of Queue – which I was expecting it to be there but apparently Google didn't implement it– to see if task queue size is 0, then it means it is processed right now because when the task finishes, it submits itself again to 1 week later.
One approach would be to have the cron job publish a message to a JMS topic to which all the servlet instances were listening. The messages could inform the servlet instances to set a value in the static boolean you mentioned to true or false.

How to trigger alerts based upon timestamps in database?

Scenario:
There's a task-manager application that allows its users to create tasks and associate a timestamp with it.
Goal:
The application is supposed to send email alerts to the users at the time when any of their tasks are due.
Question:
If there's a function in the application sendEmailAlerts, which queries database, fetches all those tasks which are due now, and send their creators alerts; is it possible to trigger this function exactly at the moment when a task is due?
The approach that I have in mind is to use a Quartz job, that would run every x minutes and invoke sendEmailAlerts. But this approach doesn't seem very efficient. Is there any better way of doing it?
Thank you for your help.
You could use SQL Server Agent to create a job to execute at a specified time, although in this scenario i don't think it's optimal to create x jobs for x alerts.

Categories