Is it possible to create a multithreaded Java EE Glassfish container?
My intention is to create an application where users can capture data launch a social network, then each user would launch a new thread with the parameters he wants to retrieve information from the social network.
all these threads would be limited in number to avoid memory server.
As I can create multiple threads in java ee and that these once the user exits the application to remain running in the background until the user closes them?
One solution may be the job of glassfish?
Your question is pretty broad, but in general I understand you need to execute a thread for each user, which runs in background even when user stops using the application (logs out), does some repetitive task, and is terminated by user when required.
First, I would point out that this can be accomplished in cleaner way using timer service - you can schedule a periodical background job, which will do everyting you need. It can read the list of user and their tasks, perform them at a given interval. Then, a user may request to cance their tasks - they will remove their task from the list.
In this way, the number of users having the background task running would not be limited. They also can run sequentially in a single thread, but you may tweak that, see the rest of my answer.
More into on shceduling a timer in Java EE tutorial: https://docs.oracle.com/javaee/7/tutorial/ejb-basicexamples004.htm.
In case you really need a separate thread per user, there are several ways how to execute a thread separately from the request-handling thread. You might use asynchronous EJB method invocation, using #Asynchronous annotation. You may also inject ManagedExecutorService and use it to execute a Runnable asynchrnously using submit method. In both ways, you would not loose context and dependency injection will continue to work.
See more details about asynchronous eecution in Java EE tutorial about Concurrency utilities
You may also execute runnables asynchronously from a timer, but you may not need that, if you execute only a single task from within a timer handler, as timer handler will be executed when timer triggers in a new thread, if the previous handler did not complete yet.
Related
I have a schedule job that run every end of the month. After running it saves some data to database.
When i scale the app(for example with 2 instances) both instances run the schedule job and both save the data and at the end of day my database has the same data.
So i want the schedule job only run one time regardless of instances numbers at cloud.
In my project, I have maintained a database table to hold a lock for each job which needs to be executed only once in the cluster.
When a Job gets triggered then it first tries to acquire lock from the database and if it gets that lock only then it will get executed. If it fails to acquire the lock then job will not get executed.
You can also look at the clustering feature of Quartz job.
http://www.quartz-scheduler.org/documentation/2.4.0-SNAPSHOT/introduction.html
I agree with the comments. If you can utilize a scheduler that's going to be your best, most flexible option. In addition, a scheduler should be executing your job as a "task" on Cloud Foundry. The task will only run on one instance, so you won't need to worry about how many instances your application is using (the two are separate in that regard).
If you're using Pivotal Cloud Foundry/Tanzu Cloud Foundry there is a scheduler you can ask your operations team to install. I don't know about other variants of CF, but I assume there are other schedulers.
https://network.pivotal.io/products/p-scheduler/
If using a scheduler is not an option then this is a concern you'll need to handle in your application. The solution of using a shared lock is a good one, but there is also a little trick you can do on Cloud Foundry that I feel is a little simpler.
When your application runs, certain environment variables are set by the platform. There is one called INSTANCE_INDEX which has a number indicating the instance on which the app is running. It's zero-based, so your first app instance will be running on instance zero, the second instance one, etc.
In your code, simply look at the instance index and see if it's zero. If the index is non-zero, have your task end without doing anything. If it's zero, then let the task proceed and do its work. The task will execute on every application instance, but it will only do work on the first instance. It's an easy way to guarantee something like a database migration or background job only runs once.
One final option would be to use multiple processes. This is a feature of Cloud Foundry that enables you to have different processes running, like your web process and a background worker process.
https://docs.cloudfoundry.org/devguide/multiple-processes.html
The interesting thing about this feature is that you can scale the different processes independently of each other. Thus you could have as many web processes running, but only one background worker which would guarantee that your background process only runs once.
That said, the downside of this approach is that you end up with separate containers for each process and the background process would need to continue running. The foundation expects it to be a long-running process, not a finite duration batch job. You could get around this by wrapping your periodic task a loop or something which keeps the process running forever.
I wouldn't really recommend this option but I wanted to throw it out there just in case.
You can use #SnapLock annotation in your method which guarantees that task only runs once. See documentation in this repo https://github.com/luismpcosta/snap-scheduler
Example:
Import maven dependency
<dependency>
<groupId>io.opensw.scheduler</groupId>
<artifactId>snap-scheduler-core</artifactId>
<version>0.3.0</version>
</dependency>
After importing maven dependency, you'll need to create the required tables tables.
Finally, see bellow how to annotate methods which guarantees that only runs once with #SnapLock annotation:
import io.opensw.scheduler.core.annotations.SnapLock;
...
#SnapLock(key = "UNIQUE_TASK_KEY", time = 60)
#Scheduled(fixedRate = 30000)
public void reportCurrentTime() {
...
}
With this approach you also guarantee audit of the tasks execution.
I have a servlet in my application used for uploading files. I then want to process the file which could take up to 5 minutes. By having this code in the servlet am I potentially blocking incoming request? Either way I think I would prefer to create a background job to handle processing the file. What is the best method for handling this? My application is running on Tomcat.
I would suggest using multi thread here:
One thread will take care to read every line of the file and insert it into a BlockingQueue in order to be processed.
Another thread(s) will take the elements from this queue and process them.
To implement this multi thread work, it would be better using ExecutorService interface and passing Runnable instances, each should implement each task. Remember to have only a single task to read the file.
I would recommend never do heavy work in servlet. Instead, fire an asynchronous task e.g. via JMS call
I am using swing and in my application i needed to run many threads in parallel like checking the internet connectivity after every 5 secs, monitoring the filesystem changes, sycing files from server.
All the time consuming tasks like above are running in SwingWorker so that my GUI should not freeze.
Same time i need to run some other time consuming tasks such as uploading file to server. for this purpose i also used swingWorker. and then i submit all these swingworker to executerService for thread pooling so that they should not effect each other.
My executer service is like this. i thought 30 threads will be enough for me.
static ExecutorService threadExecutor;
threadExecutor = Executors.newFixedThreadPool(30);
and then i submit threads in the same service.
threadExecutor.submit(monitorinternetconnectivity); //submitting swingworker obejct
Some of the threads i submit at the start and some i add runtime, when i add at runtime, it does not complete the job or stop running their job, like monitoring internet connectivity.
Is there any way to have the same functionality like swing worker, or some best way to use multiple swingworker. and we should be able to add new swingwokers at runtime to executer service
SwingWorker uses it's own ThreadPool.
SwingWorker should be used for a long running task after (i.e. anything that required more than a couple of milliseconds) after which a GUI update is required.
No calls to update gui elements outside the EDT should be done (i.e. from the SwingWorker.done() method)
If you generally follow the rules of accessing Swing components form inside the EDT (look here) then you shouldn't have a problem with locking. I suspect that the problem rather lies in your code but to be sure of that we should see it.
We need a schedule job in a Java EE server and we know how to use Quartz or the Timer service.
But our question is, if we want to change the schedule on production or manually trigger the batch, how to do it?
In the traditional solution, we use a servlet to run the job. And then use a cronjob with a http client (i.e. lynx) to trigger the servlet. It's easy to implement and could change on production.
I have never found Timers to entirely satisfactory because of this exact problem: you can't really monitor their status or modify them.
What I recommend is a second layer job manager class. When you call this class, it schedules a Java EE timer for time 'X', and it also records the fact that you want to execute a 'job' at time 'X'. When that time arrives, the Java EE timer calls this job manager class, which finds the job, and calls the job.
What this allows you to do is to write an "unschedule" function. Calling unschedule would remove the job. When the Java EE timer calls at time 'X' this class does not find any job, and so ignores it.
You can also implement a "change schedule" function that removes the old entry, and create a new entry at time 'Y' scheduling a Java EE timer for time 'Y'. The Java EE timer will arrive at both time 'X' and another at time 'Y' but only the time 'Y' will have effect.
Thus manual triggering is a matter of having a servlet that call "change schedule" to be right now.
The one other detail to be careful of: because timer events are not completely reliable, we implement this class to find all the jobs that had been scheduled before the current time, and run all of them at that moment. We then schedule extra Java EE timer events for every 5 minutes or so. Those timers will pick up any jobs that for one reason or another had been left behind. This is important if your job queue is persistent, then it might be that while restarting the server, it is down at exactly the moment that the timer was supposed to go off. No problem: Java EE Timer events themselves have no meaning, they just serve to wake up the job handler, so it can run all the outdated jobs.
I am looking for ideas on how to deal with a search related task which takes more than usual time (in human terms more than 3 seconds)
I have to query multiple sources, sift through information for the first time and then cache it in the DB for later quick return.
The context of the project is J2EE, Spring and Hibernate (on top of SpringROO)
The possible solutions I could think of
-On the webpage let the user know that task is running in background, if possible give them a queue number or waiting time. Refresh the page via a controller which basically checks if the task is done, then when its done (ie the search result is prepared and stored in DB) then just forward to a new controller and fetch the result from the DB
-The background tasks could be done with Spring Task executor. I am not sure if it is easy to give a measure of how long it would take. It would probably be a bad idea to let all the search terms run concurrently, so some sort of pooling will be a good idea.
-Another option to use background tasks is to use JMS. This is perhaps a solution with more control (retries etc)
-Spring batch also comes to mind
Please suggest how you would do it. I would greatly appreciate a semi-detailed+ description. The sources of info can be man and can be sequential in nature so it can take upto 4-5 minutes for the results to form. It is also possible that such tasks run automatically in the background without user intervention (ie to update from the sources)
From a user perspective, I use AJAX. The default web page contains some kind of "Busy" indicator. When the AJAX request completes, the busy indicator is replaced with the result.
In the background, request handlers are already multi-threaded. So you can simply format the default result, close&flush the output, and do the processing in the current thread. You should put something in the session or DB to make sure that no one can start the same heavy process a second time.
Running task pools in a web container is possible but there are some caveats, especially how to synchronize startup/shutdown: Do you want your web server to "hang" during shutdown while some thread is busy collecting your results? Also the additional load should be considered. It might be better to use JMS and offload the strain to a second server dedicated to build the search results.
Such a system will scale much better if your searches start to become a burden. It also makes it trivial to automate the process by writing a small program which posts searches in the JMS queue.
I've solved this problem in the past doing something like this:
When the user initiates a long running task, I open a popup window that displays the task status. The task status includes a name and estimated time to complete
This task is also stored in my "app" (this can be stored in the DB, session, or application context), so the user can continue doing other things on my web app while having an easy way to navigate back to the running task.
I stored my tasks in a DB, so I could manage what happens on startup and shutdown of the web app. This requires storing the progress of the task in the DB.
The tricky part is display results to the user. If you use the method I've described, you'll need to store results in either the DB, session, or application contexts.
This system I've described is pretty heavyweight, and may be overkill for your application.
In response to the comment
so what do you use to do the
background computing. I have asked
this before
I use java.util.concurrent. A lot of this depends on the nature of your application. Is the task (or steps in the task) idempotent? How critical is it that it run to completion? If you have a non-idempotent task that must run to completion, I would say you generally must record every piece of work you do, and you must do that piece of work within a transaction. For example, if one of your tasks is to email a list of people (this is definitely not idempotent) you would do the emailing in a "transaction" (I'm using the term lightly here) and store your progress after each transaction is complete.