How use Task Queue to do something at a later time? - java

According to this Google article,
"You can also use the Task Queue to do the write at a later time, which has the added benefit that the Task Queue automatically retries failures."
Suppose I'm trying to keep my daily spend on Google App Engine under a certain budget. Let's say I start to detect I'm getting low on quota for the day so I want to reschedule the work for tomorrow. It would be great to use Task Queues for this instead of Cron jobs because the initiation of the work and the rescheduling of the work can be handled pretty similarly.
How do I put a task on the Task Queue and specify that it should not begin until a particular time? I can see how I might use RetryOptions to get part of what I want, namely to delay the work. But RetryOptions doesn't seem to provide a way to specify not to retry until 24 hours have passed since "now" or don't retry until midnight.
Thanks for your help.

Looks like I can use TaskOptions.countdownMillis(long) to specify how long to wait before executing the task.

The documentation says "later time", in the sense that your application doesn't stop to wait for your write to go through, so you work in parallel.
If you want to control WHEN to start a cleanup or something similar, look into CRON jobs

Related

is it possible to restart a job after stopping?

I am working on a project in which I can hit maximum 15k hit a day to Google API. So I want to stop the job after 15k and resume it next day. Please let me know how can I do the same.
Please let me know how can I achieve the same. Right now I am thinking of using quartz scheduler to schedule the job every day.
If anyone needs full explanation, I can explain it more.
Thanks in advance.
You can stop a step execution (and its surrounding job) using StepExecution#setTerminateOnly. So in your case, you can use for example a ItemReadListener#afterRead or ItemWriteListener#afterWrite that has access to the step execution and set the terminateOnly flag after processing 15k items. When you stop the job gracefully like this, its status will be STOPPED and you will be able to restart it again the next day as you mentioned.
You can find an example in the Stopping a Job Manually for Business Reasons section of the reference documentation.
Hope this helps.
I had something similar where I needed to stop a 24/7 job 5 minutes before server maintenance was scheduled to start.
The easiest I found was to use the Reader and return null to indicate the job should stop. In your case, return null when 15k API requests were processed.
This will likely mean you'll need a bean (could be just an AtomicInteger) available to the Reader and updated by the Processor. But also a Job Listener (sorry, I don't have the code) which also knows about the bean. If the maximum is reached the Listener sets up a custom job exit value to be returned to the scheduler when the job stops. The scheduler has to be configurable enough to know the particular exit value means to start the job again the next day. (Any other non-zero value was treated as an error.)
This means there is a small possibility the job hits 15k but also that it is the last item, so the job is scheduled again for the next day even though there is nothing more to be processed. It shouldn't matter though - the job will start the next day and stop immediately with a normal complete status so the scheduler will not schedule again.

java, quartz and multiple tasks triggered at certain times saved in a database

I'm building a system where users can set a future date(down to hours and minutes) in calendar. At that date a trigger is calling a certain task, unique for every user.
Every user can set a different date. The system will have 10k+ from the start and a user can create more than one trigger.
So assuming I have 10k users each user create on average 3 triggers => 30k triggers with 30k different dates.
All dates are saved in a database.
I'm new to quartz, can this be done in a more optimized way?
I was thinking about making a task run every minute that will get the tasks that will suppose to run in the next hour and remove them from database.
Do you have any better ideas? Did someone used quartz for a large number of triggers.
You have the schedule backed in the database. If I understand the idea - you want the quartz to load all the upcoming tasks to execute them in the future.
This is problematic approach:
Synchronization Issues: I assume that users can edit, remove and add new tasks to the database. You would have to periodically ask the database to refresh the state of the quartz jobs, remove some jobs, edit other jobs etc. This may not be trivial. The state of the program would be a long living cache which needs to be synchronised often.
Performance and scalability issues: Even if proposed solution may be ok for 30K tasks it may not be ok for 70k or 700k tasks. In your approach it's not easy to scale - adding new machine would require additional layer of synchronisation - which machine should actually execute which job (as all of them have all the tasks).
What I would propose:
Add the "stage" to the Tasks table (new, queued, running, finished, failed)
divide your solution into several components. (Initially they can run on a single machine but it will be easy to scale)
Components:
Task Finder: Executed periodically (once every few seconds). Scans the database for tasks that are "new", and due soon. Sends the tasks found to Message Queue and marks the task as "queued" in the db. Marking as "queued" has to be done carefully as there can be multiple "task finders". (As an addition it may find the tasks that have been marked as "queued" or "running" more than N minutes ago and are not "finished" nor "canceled" - probably need to re-run these)
Message Queue: Connector between Taks Finder and Task Executor.
Task Executor: Listens to the Message Queue and process the tasks that it received. Marks the tasks as "running" initially and "finished" or "failed" later on.
With this approach you can have:
multiple Task Executors on multiple machines
multiple Task Schedulers on multiple machines
even if one of the Task Schedulers or Executors will fail it will not be Single Point of Failure. Some of the tasks will be delayed but it will be picked up and run afterwards.
This may not address all the scenarios but would be a good starting point.
I don't see why you need quartz here at all. As far as I remember, quartz is best used to schedule backend internal processes, not user-defined tasks obtained from db.
Just process the trigger as it is created, save a row to your tasks table with start_date based on the trigger and every second select all incomplete tasks with start_date< sysdate. If the job is repeating, calculate next execution time and insert new task row / update previous accordingly.
As Sam pointed out there are some nice topics addressing the same problem:
Quartz Performance
Quartz FAQ
In a system like the mentioned it should not a problem mostly to handle this amount of triggers. But according to my experiance it is a better way to create something like a "JobChecker". If you enable your users to create own triggers it could really break Quartz in some cases. For example if 5000 user creates an event to the exact same time, Quartz will have a hard time to handle them correctly. (It is not likely a situation that will occur often, but it is possible as your specification does not excludes it.) Quartz has difficulties only when a lot of triggers should be fired at the same time.
My recommendation to this problem is to create one job that is running in every hour/minute etc and that should handle every user set events. This way is simmilar to a cron job in bash. With this kind of processing your system will be pretty stable even if the number of "triggers" increases dramatically. Basically your line of thought is correct if you thrive for scalability.

Loop a java application in ticks

I'm making a Java server application. The application would comsume alot of resources if it just ran when possible.
As far as I know if I added a sleep method, it would run like this:
Do task (Might take 10ms to do. Can also take longer or less)
Sleep 50ms
Do task (Might take 10ms to do. Can also take longer or less)
Sleep 50ms
So how can I make it run every 50ms (20 tick)?
Thanks
You can use a ScheduledExecutorService
ScheduledExecutorService service = Executors.newScheduledThreadPool(10);
service.scheduleAtFixedRate(() -> {
System.out.println("whatever");
}, 0, 50, TimeUnit.MILLISECONDS);
// ^ rate
The scheduledAtFixedRate() method will schedule the given task for execution at a fixed rate, regardless of the time the task took. You could possibly have one execution take longer than 50ms, and the next one would still run (assuming you have enough threads).
Without knowing what your application does (you could've included it in your question), you could use a scheduler (Quartz, java.util.Timer). Which task are you trying to perform every 50ms?
Edit:
While the "game loop" is all well and good in games, servers rarely have them. Receiving data is a continuous action, and the state should change accordingly. This is a larger design issue in the server. With proper design you don't need to create artificial pauses.
For example a simple design would be having threads waiting to receive input from the clients, and when a message is received, it's processed, and a message is sent to all clients to inform of the changes. No busy waiting, nothing will happen unless a message arrives from a client.

Scheduling tasks, making sure task is ever being executed

I have an application that checks a resource on the internet for new mails. If there is are new mails it does some processing on them. This means that depending on the amount of mails it might take just a few seconds to hours of processing.
Now the object/program that does the processing is already a singleton. So right now I already took care of there really only being 1 instance that's handling the checking and processing.
However I only have it running once now and I'd like to have it continuously running, checking for new mails more or less every 10 minutes or so to handle them in a timely manner.
I understand I can take care of this with Timer/Timertask or even better I found a resource here: http://www.ibm.com/developerworks/java/library/j-schedule/index.html that uses Scheduler/SchedulerTask. But what I am afraid of.. is if I set it to run every 10 minutes and a previous session is already processing data it will put the new task in a stack waiting to be executed once the previous one is done. So what I'm afraid of is for instance the first run running for 5 hours and then, because it was busy all the time, after that it will launch 5*6-1=29 runs immediately after each other checking for mails and/do some processing without giving the server a break.
Does anyone know how I can solve this?
P.S. the way I have my application set up right now is I'm using a Java Servlet on my tomcat server that's launched upon server start where it creates a Singleton instance of my main program, then calls some method to do the fetching/processing. And what I want is to repeat that fetching/processing every "x" amount of time (10 minutes or so), making sure that really only 1 instance is doing this and that really after each run 10 minutes or so are given to rest.
Actually, Timer + TimerTask can deal with this pretty cleanly. If you schedule something with Timer.scheduleAtFixedRate() You will notice that the docs say that it will attempt to "make up" late events to maintain the long-term period of execution. However, this can be overcome by using TimerTask.scheduledExecutionTime(). The example therein lets you figure out if the task is too tardy to run, and you can just return instead of doing anything. This will, in effect, "clear the queue" of TimerTask.
Of note: TimerTask uses a single thread to execute, so it won't spawn two copies of your task side-by-side.
On the side note part, you don't have to process all 10k emails in the queue in a single run. I would suggest processing for a fixed amount of time using TimerTask.scheduledExecutionTime() to figure out how long you have, then returning. That keeps your process more limber, cleans up the stack between runs, and if you are doing aggregates, ensures that you don't have to rebuild too much data if, for example, the server is restarted in the middle of the task. But this recommendation is based on generalities, since I don't know what you're doing in the task :)

EJB timer performance

I am trying to decide if use a java-ee timer in my application or not. The server I am using is Weblogic 10.3.2
The need is: After one hour of a call to an async webservice from an EJB, if the async callback method has not been called it is needed to execute some actions. The information regarding if the callback method has been called and the date of the execution of the call is stored in database.
The two possibilities I see are:
Using a batch process that every half hour looks for all the calls that have been more than one hour without response and execute the needed actions.
Create a timer of one hour after every single call to the ws and in the #Timeout method check if the answer has come and if it has not, execute the required actions.
From a pure programming point of view, it looks easier and cleaner the second one, but I am worry of the performance issues I could have if let's say there are 100.000 Timer created at a single moment.
Any thoughts?
You would be better off having a more specialized process. The real problem is the 100,000 issue. It would depend on how long your actions take.
Because its easy to see that each second, the EJB timer would fire up 30 threads to process all of the current pending jobs, since that's how it works.
Also timers are persistent, so your EJB managed timer table will be saving and deleting 30 rows per second (60 total), this is assuming 100K transactions/hour.
So, that's an lot of work happening very quickly. I can easily see the system simply "falling behind" and never catching up.
A specialized process would be much lighter weight, could perhaps batch the action calls (call 5 actions per thread instead of one per thread), etc. It would be nice if you didn't have to persist the timer events, but that is what it is. You could almost easily simply append the timer events to a file for safety, and keep them in memory. On system restart, you can reload that file, and then roll the file (every hour create a new file, delete the older file after it's all been consumed, etc.). That would save a lot of DB traffic, but you could lose the transactional nature of the DB.
Anyway, I don't think you want to use the EJB Timer for this, I don't think it's really designed for this amount of traffic. But you can always test it and see. Make sure you test restarting your container see how well it works with 100K pending timer jobs in its table.
All depends of what is used by the container. e.g. JBoss uses Quartz Scheduler to implement EJB timer functionality. Quartz is pretty good when you have around 100 000 timer instances.
#Pau: why u need to create a timer for every call made...instead u can have a single timer thread created at start up of application which runs after every half-hour(configurable) period of time and looks in your Database for all web services calls whose response have not been received and whose requested time is past 1 hour. And for selected records, in for loop, it can execute required action.
Well above design may not be useful if you have time critical activity to be performed.
If you have spring framework in your application, you may also look up its timer services.http://static.springsource.org/spring/docs/1.2.9/reference/scheduling.html
Maybe you could use some of these ideas:
Where I'm at, we've built a cron-like scheduler which is powered by a single timer. When the timer fires the system checks which crons need to run using a Quartz CronTrigger. Generally these crons have a lot of work to do, and the way we handle that is each cron spins its individual tasks off as JMS messages, then MDBs handle the messages. Currently this runs on a single Glassfish instance and as our task load increases, we should be able to scale this up with a cluster so multiple nodes are processing the jms messages. We balance the jms message processing load for each type of task by setting the max-pool-size in glassfish-ejb-jar.xml (also known as sun-ejb-jar.xml).
Building a system like this and getting all the details right isn't trivial, but it's proving really effective.

Categories