Threads/backend in appengine java - java

I want to run some kind of Thread continuously in app engine. What the thread does is
checks a hashmap and updates entries as per some business continuously.
My hashmap is a public memeber variable of class X. And X is a singleton class.
Now I know that appengine do not support Thread and it has somethinking called backend.
Now my question is: If I run backend continiously for 24*7 will I be charged?
There is no heavy processing in backend. It just updates a hashmap based on some condition.
Can I apply some trick so that am not charged? My webapp is not for commercial use and is for fun.

Yes, backends are billed per hour. It does not matter how much they are used: https://developers.google.com/appengine/docs/billing#Billable_Resource_Unit_Costs
Do you need this calculation to happen immediatelly? You could run a cron job, say ever 5 min and perform the task.

Or you can too enqueue a 10 minutes task and re-enqueue when is near to arrive to its 10 minutes limit time. For that you can use the task parameters to pass the state of the process to the next task or also you can use datastore.

Related

Which java concurrent to use for cleaning DB (on demand/scheduled)

I have a thread cleaner in my code that is being created if the DB capacity was exceeded, the capacity is checked on every insertion to the DB. I would like to add more functionality to this cleaner and clean also when number of files exceeding, lets say 10000 files. The new functionality should run scheduled.
I want to be able to clean the DB in 2 ways:
1. On demand.
2. Scheduled, every day on X hour.
Which concurrent java class to use?
How can I make sure that the same thread will be used by the 2 ways above?
Code that would perform cleanup of DB should be completely separated out of scheduling (single responsibility principle), so that you could execute it at any time from some other code.
As for scheduling, I would suggest you looking at Quartz scheduler, and get familiar with CRON so that you could extract it to properties to have possibility to change scheduling trigger without modifying your code.
You should synchronize your code so that no more than one cleanup gets performed at the same time, this should be easy with standard synchronize.
If you wish to make it very simple and don't want to add new dependencies, you can go with standard Java solution: Timer. Timer#scheduleAtFixedRate can provide fixed rate execution. Which means you'll have to add extra code whenever new requirements will show up (e.g., don't schedule at weekend).

Scheduling tasks, making sure task is ever being executed

I have an application that checks a resource on the internet for new mails. If there is are new mails it does some processing on them. This means that depending on the amount of mails it might take just a few seconds to hours of processing.
Now the object/program that does the processing is already a singleton. So right now I already took care of there really only being 1 instance that's handling the checking and processing.
However I only have it running once now and I'd like to have it continuously running, checking for new mails more or less every 10 minutes or so to handle them in a timely manner.
I understand I can take care of this with Timer/Timertask or even better I found a resource here: http://www.ibm.com/developerworks/java/library/j-schedule/index.html that uses Scheduler/SchedulerTask. But what I am afraid of.. is if I set it to run every 10 minutes and a previous session is already processing data it will put the new task in a stack waiting to be executed once the previous one is done. So what I'm afraid of is for instance the first run running for 5 hours and then, because it was busy all the time, after that it will launch 5*6-1=29 runs immediately after each other checking for mails and/do some processing without giving the server a break.
Does anyone know how I can solve this?
P.S. the way I have my application set up right now is I'm using a Java Servlet on my tomcat server that's launched upon server start where it creates a Singleton instance of my main program, then calls some method to do the fetching/processing. And what I want is to repeat that fetching/processing every "x" amount of time (10 minutes or so), making sure that really only 1 instance is doing this and that really after each run 10 minutes or so are given to rest.
Actually, Timer + TimerTask can deal with this pretty cleanly. If you schedule something with Timer.scheduleAtFixedRate() You will notice that the docs say that it will attempt to "make up" late events to maintain the long-term period of execution. However, this can be overcome by using TimerTask.scheduledExecutionTime(). The example therein lets you figure out if the task is too tardy to run, and you can just return instead of doing anything. This will, in effect, "clear the queue" of TimerTask.
Of note: TimerTask uses a single thread to execute, so it won't spawn two copies of your task side-by-side.
On the side note part, you don't have to process all 10k emails in the queue in a single run. I would suggest processing for a fixed amount of time using TimerTask.scheduledExecutionTime() to figure out how long you have, then returning. That keeps your process more limber, cleans up the stack between runs, and if you are doing aggregates, ensures that you don't have to rebuild too much data if, for example, the server is restarted in the middle of the task. But this recommendation is based on generalities, since I don't know what you're doing in the task :)

EJB timer performance

I am trying to decide if use a java-ee timer in my application or not. The server I am using is Weblogic 10.3.2
The need is: After one hour of a call to an async webservice from an EJB, if the async callback method has not been called it is needed to execute some actions. The information regarding if the callback method has been called and the date of the execution of the call is stored in database.
The two possibilities I see are:
Using a batch process that every half hour looks for all the calls that have been more than one hour without response and execute the needed actions.
Create a timer of one hour after every single call to the ws and in the #Timeout method check if the answer has come and if it has not, execute the required actions.
From a pure programming point of view, it looks easier and cleaner the second one, but I am worry of the performance issues I could have if let's say there are 100.000 Timer created at a single moment.
Any thoughts?
You would be better off having a more specialized process. The real problem is the 100,000 issue. It would depend on how long your actions take.
Because its easy to see that each second, the EJB timer would fire up 30 threads to process all of the current pending jobs, since that's how it works.
Also timers are persistent, so your EJB managed timer table will be saving and deleting 30 rows per second (60 total), this is assuming 100K transactions/hour.
So, that's an lot of work happening very quickly. I can easily see the system simply "falling behind" and never catching up.
A specialized process would be much lighter weight, could perhaps batch the action calls (call 5 actions per thread instead of one per thread), etc. It would be nice if you didn't have to persist the timer events, but that is what it is. You could almost easily simply append the timer events to a file for safety, and keep them in memory. On system restart, you can reload that file, and then roll the file (every hour create a new file, delete the older file after it's all been consumed, etc.). That would save a lot of DB traffic, but you could lose the transactional nature of the DB.
Anyway, I don't think you want to use the EJB Timer for this, I don't think it's really designed for this amount of traffic. But you can always test it and see. Make sure you test restarting your container see how well it works with 100K pending timer jobs in its table.
All depends of what is used by the container. e.g. JBoss uses Quartz Scheduler to implement EJB timer functionality. Quartz is pretty good when you have around 100 000 timer instances.
#Pau: why u need to create a timer for every call made...instead u can have a single timer thread created at start up of application which runs after every half-hour(configurable) period of time and looks in your Database for all web services calls whose response have not been received and whose requested time is past 1 hour. And for selected records, in for loop, it can execute required action.
Well above design may not be useful if you have time critical activity to be performed.
If you have spring framework in your application, you may also look up its timer services.http://static.springsource.org/spring/docs/1.2.9/reference/scheduling.html
Maybe you could use some of these ideas:
Where I'm at, we've built a cron-like scheduler which is powered by a single timer. When the timer fires the system checks which crons need to run using a Quartz CronTrigger. Generally these crons have a lot of work to do, and the way we handle that is each cron spins its individual tasks off as JMS messages, then MDBs handle the messages. Currently this runs on a single Glassfish instance and as our task load increases, we should be able to scale this up with a cluster so multiple nodes are processing the jms messages. We balance the jms message processing load for each type of task by setting the max-pool-size in glassfish-ejb-jar.xml (also known as sun-ejb-jar.xml).
Building a system like this and getting all the details right isn't trivial, but it's proving really effective.

How to share data between scheduled jobs

I am writing a scheduler which grabs XML data and inserts into MySQL DB - simple isn't. But the problem or the logic that I am trying to find is here. NOTE: I want to execute this in windows environment in future it might be configured for other platforms.
Scheduler should run on every 5 mins.
This script should fetch condition/configuration on what to parse and collect the data-fields from XML and these conditions are available from MySQL table.
This table also defines a delay in which this script should check for the difference in the XML fields & delay.
This script does both, one is running for every 5 mins to collect XML and check the difference in the table (MySQL) for every said delay.
This script then reads the XML data-fields and parses it, then collects only those data-fields that is defined from the above MySQL table.
The collected data will be inserted into MySQL DB only when there is change in the state and this state is defined from MySQL table.
Feedback/Suggestions:
Due to the delay, I am not sure how should I store the configuration in the script which will be shared between each schedules.
Is there anyway to use static variable in the code to store this data? Which will be shared b/w different jobs? or different schedules?
Basically, how should I implement this? A better approach in terms of performance.
Thanks for your time.
UPDATE:
One of the suggestion is to use Java Code as a windows service (?) we could have some common data shared between different jobs? - does it make sense?
Reference:
Java Service Wrapper
Concurensy is the answer, try creating Thread pool or Executor servises, and stop certain threads for 5min , you coud even use Synchronization if few threads will be working with the same resource.
Remember not always the more threads use the faster you will finish your job f.e. 3 threads- 2 min
5threads-6 min
*Read tutorial about threads
*create fe simple threads with wait for 5 min
*read some tutorials about thread pool/synchnizations and sharing resourses (script part)
*test to find the most optimal way

Boolean flag available over multiple instances in App Engine

I have an app that runs over several instances and all requests come through one servlet.
I need to run a cron job which executes once a week for about 3 minutes. During that cron call some kind of flag/boolean will be modified somewhere so that the servlet can pick up and send an "server temporarily unavailable" type message back instead of processing the request. Once the cron job is complete it will flag it back to true.
I cannot use a singleton or a static boolean as the app will be in multiple instances. Nor do I want the servlet to have to fetch a value from the datastore on every request, as it will mean hundreds of thousands of extra datastore reads.
What can I do? Any ideas?
I think you may be able to store boolean in memcached. GAE has a Cache API for Memcached. However note that cache values are not persistent and may not be survived for even 3 minutes. I think you should have a firm time to start cron task hardcoded in one of your Java classes or .properties file and then when your task finishes, it should look at that hard-coded time and schedule itself for next round according to that time.
And by this way your servlet can also look at that time and do not serve requests in the interval you are going to specify. Yeah, that will be very fast but your jobs will be scheduled to a fixed time periodically and you won't be able to change this unless you re-deploy application.
I think the better solution is you should keep the boolean in the datastore and make use of cache. See the following algorithm:
is my boolean in the cache?
yes:
[alright, then choose to serve or not to serve request using it.]
no:
[fetch variable from datastore and put it on the cache.] (cache miss)
Again, cache will be fast, but not as much as hard-coding the schedule in the program.
EDIT: Another solution. (however not possible to implement)
If you want to serving pages during the task execution, you should use a task api
First of all you should be familiar with using countdown for your task (in this case next week) http://code.google.com/appengine/docs/java/javadoc/com/google/appengine/api/taskqueue/TaskOptions.html#countdownMillis(long)
Then you can use size() method of Queue – which I was expecting it to be there but apparently Google didn't implement it– to see if task queue size is 0, then it means it is processed right now because when the task finishes, it submits itself again to 1 week later.
One approach would be to have the cron job publish a message to a JMS topic to which all the servlet instances were listening. The messages could inform the servlet instances to set a value in the static boolean you mentioned to true or false.

Categories