I do not know really what I am looking for and I am not asking for the best solution. I would like to ask for possible solutions to do what I am asking for.
I am building an application using Spring Boot.
I have a database with words in it. I am using these words on a site to search for products and would like to run them over and over again with some time between each search.
So basically the application ask the database for all words that has not been searched for within a specific time (let say 5 minutes). The words that I get in my query response I send out to a KafkaQueue and they are then processed by workers. As soon as they have been used by a worker I make an update in the database with the current time.
So I make a search in the database like every 1 minute (or more often) to find the ones that has not been used for 5 min and then runs them again.
This gives me a lot of connections to the database and I was thinking if there is a better solution. The workers are also saving other data to other tables in the database to.
It is like 80 - 90 words that are turned over like every 5 minutes.
I had a thought to pick some of them and then send them to like a scheduler to set them to run in the remaining of the time until 5 minutes have passed.
If I schedule like 20 tasks at the time will this effect the memory a lot?
Currently I am using postgresql but maybe this is not the best DB for this kind of execution?
I will be able to remove and/or add new words so I do not know if it is possible to even use a in-memory database for the words.
Sounds like a scheduler from SpringBoot is what you need. Look into using the #Scheduled annotation. The following link should provide you with everything Scheduled Annotation .
I ended up using JobRunr which I was able to schedule different delays for each task I set up. So far it has been really smoth using it and it has been working out fine for me.
Scheduled annotation in Spring Boot works really fine to but not for me when I wanted to have more flexibility in setting up different delays for each task.
Related
We're developing a web app and are coming to the end of development, and the client we're working with has suddenly sprung the fact on us that we will need to be able to handle load balancing.
We have batch jobs running which would need to run on both servers, but we don't want them to overlap. They are selecting rows from the database, processing the objects, and merging them back into the database. One of these jobs MUST run at the same time each day, though the others run every n minutes. We have about a week at most to get something working, and it'll become technical debt for us.
My question is: what quick and dirty hacks exist to get this working properly? We have a SQLServer 2008 instance and are running Java EE 6 on JBoss 5, which will be load balanced between two servers. We're using Spring 3 and JPA2 backed by Hibernate, and using the stock spring scheduler to schedule and run the jobs. Help me Obi Wan Kenobi; you're my only hope!
on jboss5 u need to use Scheduler API as the simplest solution - the implmentation is built on top of quartz and generally you would user clustered configuration like described here
http://quartz-scheduler.org/documentation/quartz-2.x/configuration/ConfigJDBCJobStoreClustering
Almost 10 years after this question was asked, I had the same need and the "quickest and dirty-ess" solution for me was a load balancer using shared file system without any master.
Each worker locks->picks the jobs from the shared file system, independent of other workers. To balance load, each worker sleeps X seconds between each job polling iteration, where X is proportional to load on the worker (in my case load is count of processes started by worker in the background). Thus high load sleeper gives higher probability to other workers to pick up the next job. Worker loops are running under supervisor (linux).
My use case was execution of sparklyr client-mode jobs on Spark/Hadoop cluster without overloading the edge nodes. It was implemented as a bash script within few hours and then scaled to 3 hosts, and has been stable for some months now - till there is time to invest in a better solution.
What is the best practice for sending email campaigns?
My company is asking me to come up with an application that is able to send hundres of thousands of emails per day.
We have the capacity to send this amount using Amazon SES.
As a PHP developer I have created a script with PHP to find for example 100,000 records from the database and send emails one by one according to use preferences. This script is executed using cron several times a day.
But this approach fails due to the script being slow, and the browser time outs (Even with high php set_timeout). Or in other words, its not robust and reliable.
I was thinking to perhaps use Java or some other "active" programming language that is alive in the background and is able to handle this without timing out etc.
Has any one of you had this issue before? What is your suggestions for this large scale mailout platform?
Side note 1: We call an API in order to send emails, no sendmail etc.
Side note 2: It has to be able to Call an API around 40 times per second, my script only calls 1 per second
Side note 3: Database is MySQL
If you need to do long running tasks in PHP I recommend running the script from the command line, without a web server. You won't have the timeout issue.
WOW - Sounds like a SPAM factory :) Anyways, I would look into writing some type of service that can spin multiple threads up and process the requests that way. 40 times per second to the cloud seems like a lot. Good luck!
Better purchase something like expressmail from godaddy .
Always having the chance to mark as spam if we do manually using php
I am looking for ideas on how to deal with a search related task which takes more than usual time (in human terms more than 3 seconds)
I have to query multiple sources, sift through information for the first time and then cache it in the DB for later quick return.
The context of the project is J2EE, Spring and Hibernate (on top of SpringROO)
The possible solutions I could think of
-On the webpage let the user know that task is running in background, if possible give them a queue number or waiting time. Refresh the page via a controller which basically checks if the task is done, then when its done (ie the search result is prepared and stored in DB) then just forward to a new controller and fetch the result from the DB
-The background tasks could be done with Spring Task executor. I am not sure if it is easy to give a measure of how long it would take. It would probably be a bad idea to let all the search terms run concurrently, so some sort of pooling will be a good idea.
-Another option to use background tasks is to use JMS. This is perhaps a solution with more control (retries etc)
-Spring batch also comes to mind
Please suggest how you would do it. I would greatly appreciate a semi-detailed+ description. The sources of info can be man and can be sequential in nature so it can take upto 4-5 minutes for the results to form. It is also possible that such tasks run automatically in the background without user intervention (ie to update from the sources)
From a user perspective, I use AJAX. The default web page contains some kind of "Busy" indicator. When the AJAX request completes, the busy indicator is replaced with the result.
In the background, request handlers are already multi-threaded. So you can simply format the default result, close&flush the output, and do the processing in the current thread. You should put something in the session or DB to make sure that no one can start the same heavy process a second time.
Running task pools in a web container is possible but there are some caveats, especially how to synchronize startup/shutdown: Do you want your web server to "hang" during shutdown while some thread is busy collecting your results? Also the additional load should be considered. It might be better to use JMS and offload the strain to a second server dedicated to build the search results.
Such a system will scale much better if your searches start to become a burden. It also makes it trivial to automate the process by writing a small program which posts searches in the JMS queue.
I've solved this problem in the past doing something like this:
When the user initiates a long running task, I open a popup window that displays the task status. The task status includes a name and estimated time to complete
This task is also stored in my "app" (this can be stored in the DB, session, or application context), so the user can continue doing other things on my web app while having an easy way to navigate back to the running task.
I stored my tasks in a DB, so I could manage what happens on startup and shutdown of the web app. This requires storing the progress of the task in the DB.
The tricky part is display results to the user. If you use the method I've described, you'll need to store results in either the DB, session, or application contexts.
This system I've described is pretty heavyweight, and may be overkill for your application.
In response to the comment
so what do you use to do the
background computing. I have asked
this before
I use java.util.concurrent. A lot of this depends on the nature of your application. Is the task (or steps in the task) idempotent? How critical is it that it run to completion? If you have a non-idempotent task that must run to completion, I would say you generally must record every piece of work you do, and you must do that piece of work within a transaction. For example, if one of your tasks is to email a list of people (this is definitely not idempotent) you would do the emailing in a "transaction" (I'm using the term lightly here) and store your progress after each transaction is complete.
I'm developing an application which requires Contents of the database to be written to an ms-excel file at the end of each day. I've written the code for copying the contents into ms-excel file but Now how to proceed further? Whether threads are to be used to check for the completion of 24 hours or there's some other mechanism? Please provide me some guidance.
If you need to facility to run things at set times during the day, you should consider the Quartz Scheduler. It might be overkill, but it's very capable.
For example, you can use its CronTrigger to configure a job to run on a schedule defined by a cron expression, e.g. 0 23 55 * * ? (or something like that) would run your job at 5 to midnight every night (see examples).
Quartz recently got a boost to its future and fortunes by being acquired by the Terracotta folks. Hopefully it'll get some real active development now.
I agree with the others that using something like crontab would be better. However, if you can't do that, I would use the java.util.concurrent package added in Java 1.5. The class you would need is ScheduledThreadPoolExecutor. Specifically, the scheduleAtFixedRate() method.
I think that from the design perspective it is better to use crontab on linux platform or task scheduler on windows platform. It will keep your java program small, and simple. While the solution with thread waiting for the specific time seems simple it will add one serious concern - you will have to monitor its health.
In addition - I would suggest to carefully plan logs your job is writing each time it is run. It is important to have logs for both successful and unsuccessful runs.
It makes sense to make separate file for such logs.
One more case to be considered - what to do if database was not available exactly in the time when job run? Is it acceptable to wait another 24 hours?
In podcast #15, Jeff mentioned he twittered about how to run a regular event in the background as if it was a normal function - unfortunately I can't seem to find that through twitter. Now I need to do a similar thing and are going to throw the question to the masses.
My current plan is when the first user (probably me) enters the site it starts a background thread that waits until the alloted time (hourly on the hour) and then kicks off the event blocking the others (I am a Windows programmer by trade so I think in terms of events and WaitOnMultipleObjects) until it completes.
How did Jeff do it in Asp.Net and is his method applicable to the Java web-app world?
I think developing a custom solution for running background tasks doesn't always worth, so I recommend to use the Quartz Scheduler in Java.
In your situation (need to run background tasks in a web application) you could use the ServletContextListener included in the distribution to initialize the engine at the startup of your web container.
After that you have a number of possibilities to start (trigger) your background tasks (jobs), e.g. you can use Calendars or cron-like expressions. In your situation most probably you should settle with SimpleTrigger that lets you run jobs in fixed, regular intervals.
The jobs themselves can be described easily too in Quartz, however you haven't provided any details about what you need to run, so I can't provide a suggestion in that area.
As mentioned, Quartz is one standard solution. If you don't care about clustering or persistence of background tasks across restarts, you can use the built in ThreadPool support (in Java 5,6). If you use a ScheduledExecutorService you can put Runnables into the background thread pool that wait a specific amount of time before executing.
If you do care about clustering and/or persistence, you can use JMS queues for asynchronous execution, though you will still need some way of delaying background tasks (you can use Quartz or the ScheduledExecutorService to do this).
Jeff's mechanism was to create some sort of cached object which ASP.Net would automatically recreate at some sort of interval - It seemed to be an ASP.Net specific solution, so probably won't help you (or me) much in Java world.
See https://stackoverflow.fogbugz.com/default.asp?W13117
Atwood: Well, I originally asked on Twitter, because I just wanted something light weight. I really didn't want to like write a windows service. I felt like that was out of band code. Plus the code that actually does the work is a web page in fact, because to me that is a logical unit of work on a website is a web page. So, it really is like we are calling back into the web site, it's just like another request in the website, so I viewed it as something that should stay inline, and the little approach that we came up that was recommended to me on Twitter was to essentially to add something to the application cache with a fixed expiration, then you have a call back so when that expires it calls a certain function which does the work then you add it back in to the cache with the same expiration. So, it's a little bit, maybe "ghetto" is the right word.
My approach has always been to have to OS (i.e. Cron or the Windows task scheduler) load a specific URL at some interval, and then setup a page at that URL to check it's queue, and perform whatever tasks were required, but I'd be interested to hear if there's a better way.
From the transcript, it looks like FogBugz uses the windows service loading a URL approach also.
Spolsky: So we have this special page called heartbeat.asp. And that page, whenever you hit it, and anybody can hit it at anytime: doesn't hurt. But when that page runs it checks a queue of waiting tasks to see if there's anything that needs to be done. And if there's anything that needs to be done, it does one thing and then looks in that queue again and if there's anything else to be done it returns a plus, and the entire web page that it returns is just a single character with a plus in it. And if there's nothing else to be done, the queue is now empty, it returns a minus. So, anybody can call this and hit it as many times, you can load up heartbeat.asp in your web browser you hit Ctrl-R Ctrl-R Ctrl-R Ctrl-R until you start getting minuses instead of pluses. And when you've done that FogBugz will have completed all of its maintenance work that it needs to do. So that's the first part, and the second part is a very, very simple Windows service which runs, and its whole job is to call heartbeat.asp and if it gets a plus, call it again soon, and if it gets a minus call it again, but not for a while. So basically there's this Windows service that's always running, that has a very, very, very simple task of just hitting a URL, and looking to see if it gets a plus or a minus and, and then scheduling when it runs again based on whether it got a plus or a minus. And obviously you can do any kind of variation you want on this theme, like for example, uh you could actually, instead of returning just a plus or minus you could say "Okay call me back in 60 seconds" or "Call me back right away I have more work to be done." And that's how it works... so that maintenance service it just runs, you know, it's like, you know, a half page of code that runs that maintenance service, and it never has to change, and it doesn't have any of the logic in there, it just contains the tickling that causes these web pages to get called with a certain guaranteed frequency. And inside that web page at heartbeat.asp there's code that maintains a queue of tasks that need to be done and looks at how much time has elapsed and does, you know, late-night maintenance and every seven days delete all the older messages that have been marked as spam and all kinds of just maintenance background tasks. And uh, that's how that does that.
We use jtcron for our scheduled background tasks.
It works well, and if you understand cron it should make sense to you.
Here is how they do it on StackOverflow.com:
https://blog.stackoverflow.com/2008/07/easy-background-tasks-in-aspnet/