Scenario:
There's a task-manager application that allows its users to create tasks and associate a timestamp with it.
Goal:
The application is supposed to send email alerts to the users at the time when any of their tasks are due.
Question:
If there's a function in the application sendEmailAlerts, which queries database, fetches all those tasks which are due now, and send their creators alerts; is it possible to trigger this function exactly at the moment when a task is due?
The approach that I have in mind is to use a Quartz job, that would run every x minutes and invoke sendEmailAlerts. But this approach doesn't seem very efficient. Is there any better way of doing it?
Thank you for your help.
You could use SQL Server Agent to create a job to execute at a specified time, although in this scenario i don't think it's optimal to create x jobs for x alerts.
Related
In a microservice architecture, suppose there is a business scenario where a user purchases something that will expire after two years, and the system needs to notify the user a little bit in advance.
In this case, how should we handle the situation so that the users can be notified on time even if there are many users who need to be notified?
For example, using a delayed queue of message queue will cause the messages to pile up when there are many users; using a timed task, too many users will overload the server CPU.
Is there a good way to do this?
While "microservices" do not inherently mean "REST", they usually are. And in REST you shouldn't store in memory anything that needs to survive more than one request. Two years is an extreme case, but even if it is for just 10 minutes, it should probably go to the DB.
Building up a queue for two years will just be very impractical and likely to fail if the queue contents are not persisted somewhere. Since you mention purchases I am assuming you have some sort of data store to record them either in sql or no-sql.
You can simply add purchase date/time column(s) to the table to make life easier. If you volumes are low enough for daily purchases then I would start with date based lookup only. You will need a scheduled execution of some service method say at 6am everyday that looks up purchases close to expiry i.e 7 days before 2 years purchase_date = now - 723days and then send rest request somewhere or publishes an event or jms message with order number and purchase_date as content for each purchase order. This will then be picked up by event/message listener somewhere and processed accordingly i.e. send a notification to customer. To avoid sending duplicate notifications you should also persist the expiry notifications in a database and ensure you check that notification has been sent for purchase id before sending it again.
If you ever reach a situation where you are processing thousands of orders a day and don't want to publish large number of events in one go then extend the functionality to filter by purchase timestamp and process chunks of purchases multiple times a day by changing the lookup condition.
This is just general idea of such requirement and you will have to fine-grain a lot of implementation details such as what happens if your email server is down.
You can use quartz job and configure it to use persistent mode in database (JDBC JobStore) to not loose information and also it is suitable for clustering mode.
Quartz checks periodically the database for the nearest task (configurable parameter) if the time comes, it will process the notification.
You can configure the thread pool size in order to avoid overload.
I'm building a system where users can set a future date(down to hours and minutes) in calendar. At that date a trigger is calling a certain task, unique for every user.
Every user can set a different date. The system will have 10k+ from the start and a user can create more than one trigger.
So assuming I have 10k users each user create on average 3 triggers => 30k triggers with 30k different dates.
All dates are saved in a database.
I'm new to quartz, can this be done in a more optimized way?
I was thinking about making a task run every minute that will get the tasks that will suppose to run in the next hour and remove them from database.
Do you have any better ideas? Did someone used quartz for a large number of triggers.
You have the schedule backed in the database. If I understand the idea - you want the quartz to load all the upcoming tasks to execute them in the future.
This is problematic approach:
Synchronization Issues: I assume that users can edit, remove and add new tasks to the database. You would have to periodically ask the database to refresh the state of the quartz jobs, remove some jobs, edit other jobs etc. This may not be trivial. The state of the program would be a long living cache which needs to be synchronised often.
Performance and scalability issues: Even if proposed solution may be ok for 30K tasks it may not be ok for 70k or 700k tasks. In your approach it's not easy to scale - adding new machine would require additional layer of synchronisation - which machine should actually execute which job (as all of them have all the tasks).
What I would propose:
Add the "stage" to the Tasks table (new, queued, running, finished, failed)
divide your solution into several components. (Initially they can run on a single machine but it will be easy to scale)
Components:
Task Finder: Executed periodically (once every few seconds). Scans the database for tasks that are "new", and due soon. Sends the tasks found to Message Queue and marks the task as "queued" in the db. Marking as "queued" has to be done carefully as there can be multiple "task finders". (As an addition it may find the tasks that have been marked as "queued" or "running" more than N minutes ago and are not "finished" nor "canceled" - probably need to re-run these)
Message Queue: Connector between Taks Finder and Task Executor.
Task Executor: Listens to the Message Queue and process the tasks that it received. Marks the tasks as "running" initially and "finished" or "failed" later on.
With this approach you can have:
multiple Task Executors on multiple machines
multiple Task Schedulers on multiple machines
even if one of the Task Schedulers or Executors will fail it will not be Single Point of Failure. Some of the tasks will be delayed but it will be picked up and run afterwards.
This may not address all the scenarios but would be a good starting point.
I don't see why you need quartz here at all. As far as I remember, quartz is best used to schedule backend internal processes, not user-defined tasks obtained from db.
Just process the trigger as it is created, save a row to your tasks table with start_date based on the trigger and every second select all incomplete tasks with start_date< sysdate. If the job is repeating, calculate next execution time and insert new task row / update previous accordingly.
As Sam pointed out there are some nice topics addressing the same problem:
Quartz Performance
Quartz FAQ
In a system like the mentioned it should not a problem mostly to handle this amount of triggers. But according to my experiance it is a better way to create something like a "JobChecker". If you enable your users to create own triggers it could really break Quartz in some cases. For example if 5000 user creates an event to the exact same time, Quartz will have a hard time to handle them correctly. (It is not likely a situation that will occur often, but it is possible as your specification does not excludes it.) Quartz has difficulties only when a lot of triggers should be fired at the same time.
My recommendation to this problem is to create one job that is running in every hour/minute etc and that should handle every user set events. This way is simmilar to a cron job in bash. With this kind of processing your system will be pretty stable even if the number of "triggers" increases dramatically. Basically your line of thought is correct if you thrive for scalability.
I'm searching materials/ideas/designs to solve architecture problem:
I'll have several agents which handle some processing, as a result they can generate state for clients which will expire after some time. Let's say client sent presence state which expire after 1h. I wondering how to write service to keep track of expiration time of scheduled events.
1) create sorted collection with timestamps and process it by some executor
2) put all into DB and perform cyclic check using sorted query
Any suggestions are appreciated.
If you are using spring framework, you can use Spring cron http://docs.spring.io/spring/docs/3.0.x/spring-framework-reference/html/scheduling.html
I am trying to decide if use a java-ee timer in my application or not. The server I am using is Weblogic 10.3.2
The need is: After one hour of a call to an async webservice from an EJB, if the async callback method has not been called it is needed to execute some actions. The information regarding if the callback method has been called and the date of the execution of the call is stored in database.
The two possibilities I see are:
Using a batch process that every half hour looks for all the calls that have been more than one hour without response and execute the needed actions.
Create a timer of one hour after every single call to the ws and in the #Timeout method check if the answer has come and if it has not, execute the required actions.
From a pure programming point of view, it looks easier and cleaner the second one, but I am worry of the performance issues I could have if let's say there are 100.000 Timer created at a single moment.
Any thoughts?
You would be better off having a more specialized process. The real problem is the 100,000 issue. It would depend on how long your actions take.
Because its easy to see that each second, the EJB timer would fire up 30 threads to process all of the current pending jobs, since that's how it works.
Also timers are persistent, so your EJB managed timer table will be saving and deleting 30 rows per second (60 total), this is assuming 100K transactions/hour.
So, that's an lot of work happening very quickly. I can easily see the system simply "falling behind" and never catching up.
A specialized process would be much lighter weight, could perhaps batch the action calls (call 5 actions per thread instead of one per thread), etc. It would be nice if you didn't have to persist the timer events, but that is what it is. You could almost easily simply append the timer events to a file for safety, and keep them in memory. On system restart, you can reload that file, and then roll the file (every hour create a new file, delete the older file after it's all been consumed, etc.). That would save a lot of DB traffic, but you could lose the transactional nature of the DB.
Anyway, I don't think you want to use the EJB Timer for this, I don't think it's really designed for this amount of traffic. But you can always test it and see. Make sure you test restarting your container see how well it works with 100K pending timer jobs in its table.
All depends of what is used by the container. e.g. JBoss uses Quartz Scheduler to implement EJB timer functionality. Quartz is pretty good when you have around 100 000 timer instances.
#Pau: why u need to create a timer for every call made...instead u can have a single timer thread created at start up of application which runs after every half-hour(configurable) period of time and looks in your Database for all web services calls whose response have not been received and whose requested time is past 1 hour. And for selected records, in for loop, it can execute required action.
Well above design may not be useful if you have time critical activity to be performed.
If you have spring framework in your application, you may also look up its timer services.http://static.springsource.org/spring/docs/1.2.9/reference/scheduling.html
Maybe you could use some of these ideas:
Where I'm at, we've built a cron-like scheduler which is powered by a single timer. When the timer fires the system checks which crons need to run using a Quartz CronTrigger. Generally these crons have a lot of work to do, and the way we handle that is each cron spins its individual tasks off as JMS messages, then MDBs handle the messages. Currently this runs on a single Glassfish instance and as our task load increases, we should be able to scale this up with a cluster so multiple nodes are processing the jms messages. We balance the jms message processing load for each type of task by setting the max-pool-size in glassfish-ejb-jar.xml (also known as sun-ejb-jar.xml).
Building a system like this and getting all the details right isn't trivial, but it's proving really effective.
I am writing a scheduler which grabs XML data and inserts into MySQL DB - simple isn't. But the problem or the logic that I am trying to find is here. NOTE: I want to execute this in windows environment in future it might be configured for other platforms.
Scheduler should run on every 5 mins.
This script should fetch condition/configuration on what to parse and collect the data-fields from XML and these conditions are available from MySQL table.
This table also defines a delay in which this script should check for the difference in the XML fields & delay.
This script does both, one is running for every 5 mins to collect XML and check the difference in the table (MySQL) for every said delay.
This script then reads the XML data-fields and parses it, then collects only those data-fields that is defined from the above MySQL table.
The collected data will be inserted into MySQL DB only when there is change in the state and this state is defined from MySQL table.
Feedback/Suggestions:
Due to the delay, I am not sure how should I store the configuration in the script which will be shared between each schedules.
Is there anyway to use static variable in the code to store this data? Which will be shared b/w different jobs? or different schedules?
Basically, how should I implement this? A better approach in terms of performance.
Thanks for your time.
UPDATE:
One of the suggestion is to use Java Code as a windows service (?) we could have some common data shared between different jobs? - does it make sense?
Reference:
Java Service Wrapper
Concurensy is the answer, try creating Thread pool or Executor servises, and stop certain threads for 5min , you coud even use Synchronization if few threads will be working with the same resource.
Remember not always the more threads use the faster you will finish your job f.e. 3 threads- 2 min
5threads-6 min
*Read tutorial about threads
*create fe simple threads with wait for 5 min
*read some tutorials about thread pool/synchnizations and sharing resourses (script part)
*test to find the most optimal way