I have a scenario to ask regarding utilizing the EJB Timer Service.
Use case as follows:
The system should be able to schedule a task that will poll/ask our subversion repository for files changes using some particular timestamp.
The idea is that whenever the scheduled task is about to run, it will execute command against a particular svn repository.
For this particular purpose, I will not call any external process but will use the 'pure' java way of using the SVNKit java library http://svnkit.com/
My only concern is this:
Is it a good idea to use the EJB Timer Service to execute task that will call external processes? My way will use a 'pure' java way but in other scenario such as calling a batch file/command line/external executable directly into the timer service logic.
I worry about the effects of server memory use/performance etc.
Is this a good idea?
The other thought that I am thinking is to just create a 'desktop' application in the server using client based technology such as SWT/Swing that will do the polling and then code the logic there but this will mean that I need to manage two applications. The 'desktop' app that will poll and the 'web' user interface that I will create in Glassfish.
I am leaning towards doing everything in the App server of my choice which is glassfish.
I have used EJB Timer before but it only calls against the database without calling any extenral service and it's just that this scenario came up so I raised a question here to gather more thoughts from those who have experienced doing this.
Any thoughts?
In theory, EJBs aren't supposed to depend on external I/O since it interferes with the container/server's management of bean instances, threads, etc.
In practice, this should work if you take precautions. For example:
isolate the function to its own EJB (i.e., a stateless session bean that only handles these timers) to avoid instance pooling issues
use timeouts while waiting for commands to avoid hung processes from hanging all server threads
ensure that you don't schedule timers so that you have multiple OS commands run simultaneously
Keep in mind that EJB 3.0 timers are persistent (vs EJB 3.1 timers, which have the option of being non-persistent), which means:
They can run on any server in a cluster. If you have multiple machines in your cluster, you need to ensure that they are all capable of running the command.
They survive server restarts. If you schedule a timer to run but the server crashes before it can, it will run when the server restarts. This can cause particular problems for interval timers (all missed timers will fire repeatedly) and if you don't carefully manage existing times (you can easily create redundant timers).
Related
I am trying to build a webapp with a Vaadin frontend which lets a user upload and process data on our server. The process is quite complicated and is a multi-threaded app (let's call this the 'core'). Whilst designing this app, I thought I could stick everything onto the tomcat server but a colleague of mine told me that natively, Vaadin is RESTful and will thus not run the business process continuously because the application is stateless. He claims that the tomcat JVM will simply go to sleep after running the request and not complete the thread process. Therefore, he suggests that I use RMI to send the data to another process on the same server and process it there instead.
I have a few questions about this:
Is all that he's claimed true? There are some intricacies of implementing Vaadin on Tomcat that I'm not aware of?
More likely I think I'm misunderstanding him and he's actually explaining on why it's better to seperate presentation and business components (which I completely agree with). But on a purely theoretical point of view, would it be possible to stick the multi-threaded core onto the same tomcat server instance as the one running Vaadin?
As far as i know, Vaadin does not use REST services for client-server communication. It is stateful and uses some kind of backing beans.
Regarding your thread issue, if you call your long running task directly from a Vaadin component, it will block the thread processing your request until the task is done. From the browser point of view, you'll have to wait and see the spinning indicator until the process is done (or an exception due to request timeout is thrown).
What you can do is to run your long running task in a separate thread. If you want the new thread to run on the same JVM, you do not need something like RMI.
You can do it by either:
Use an ExecutorService (e.g.: Executors.newSingleThreadExecutor()) and submit a task into
Create a new thread and start it
Do something like: https://vaadin.com/forum/#!/thread/2008536/2010911
Note that you'll probably have to implement some kind of notification mechanism to know when the thread has completed the task.
You can start separate threads from tomcat as needed.
It does not matter what frontend you have for this.
But what's important is to access the vaadin UI components the correct way when you wish to update them from another thread.
For vaadin 7 this has been greatly enhanced, to allow server push out of the box.
In vaadin 6 you had to use some work arrounds for this.
https://vaadin.com/book/-/page/advanced.push.html#advanced.push.running
We use this concept a lot for export and report generation.
- Use click on Export/Report
- On the server we start a (low priority) thread which builds the report/export
- During this, we update a progressbar on the client via server push
- Once the thread has generated the export/report we send it to the webbrowser
If you wish to have a core running always and accepting "jobs" then perhaps you are better served with a job sheduler like quartz or similar.
Context
I'm in the process of drawing a solution to migrate a huge PL/SQL system to Java. The initial step is migrating some ETL jobs that:
Reads CSV, XML, (XLS, which is a new requirement) and Positional files from several ftp / sftp sources
Process the files according to rules stored in the database and write the results to a database table.
Currently this is done by several store procedures and Jobs.
My company is open to suggestions (if it can run in GlassFish 4 and share its logging and connection pool mechanisms, as well as the admin console, it is a plus).
I've done a little bit of research and the following options caught my eye:
Java EE 7 Batch Processing, sounds simple and particularly well fitted for GlassFish 4.
Spring Batch somewhat more mature and very similar to the Java EE 7 standard (which was probably based on it).
Apache Camel, sounds powerful and would spare us from a lot of fiddling with libraries such a Apache POI, but it also looks somewhat complex. Also I'm not sure if it is the best fit for the job (ETL over huge files).
Cook everything by myself. I could create a Application Client to run a Quartz / Spring Scheduler or even EJB Timers
While I'm still open to suggestions (recommendations would be nice), the best fit so far seems to be Java EE 7 Batch Processing.
One more thing, the infrastructure team have a solution to move files from every ftp source to a local directory, so FTP is really not an issue.
Problem
I've read several tutorials about Java EE Batch Processing and, in all of them, some kind of Servlet or EJB Timer is responsible for starting the Jobs:
JobOperator jobOperator = BatchRuntime.getJobOperator();
jobOperator.start("job", properties);
I could easily upload a web / ejb project and keep pooling for changes. But I was thinking about a push model:
Application client console application
Main class watches directories for new files
When there is a new file it would start a new job.
My doubts are:
Is this strategy possible/ advisable?
Will I need a JMS queue or some kind of producer / consumer strategy in the middle or should I just call jobOperator.start for every file and trust the batch processing layer to manage the application resources? In other words, if a thousand files are delivery to my folder at once and I call jobOperator.start a thousand times, will GlassFish 4 do some kind of smart enqueuing or should I create some kind of Gate so that no more than n jobs run simultaneously?
I've already implemented a project with Batch Processing in Wildfly (Jboss AS). I'm not familiar with configuration details on Glassfish (not using it anymore because the've dropped enterprise support), however I can give you some insights and guidelines according to my experience. Also, please note that Spring and the Batch spec. on EE 7 are quite similar, and your decision to use either technology must depend on "what else" you want to achieve with your application besides the batching. Do you want an easily maintained web interface? Do you want to depelop a REST api?, etc.
The ETL jobs you're describing fit pefeclty with the steps and chunks model in the EE 7 spec, so If you've already tried to develop some tests, you may have noticed that you still need to code the file readers and mappers for each file specification. Your reading sources are quite standard, and you will easily find a library to read/stream them and process their data.
The project I've implemented is quite simple. Customers uplodad files that need to be processed in order to feed a data warehouse. This service is on the "cloud". Files have a defined spec and must be in CSV format. Most processing results are dimentional "Upserts" and fact "erasing prior inserting". The user has a Web interface on which files and batch processing metadata must be shown (processing state, dates, rejected items, etc.). Because it is a cloud service, the files must not reside locally on each server (using S3).
So the first thing to design are the chunk steps. I didn't want to have an implementation for each file spec., So what I did is to design a "fit all cases" implemetation that process files according to the metadata contained in them and also the job configuration itself. This is the easy part. The second thing to think about is the processing and metadata administration. Here, I developed a REST api and a Web interface that uses it. After all this, Will it scale? Wilfly has thread configuration parameters for the Batch Processing, and you can increase or decrease the thread availability for the JobOperator. Jobs are not submitted if there are not enough threads available. So what happends to those requests? Well, they can reside on memory, a backed up stateful session can be developed, you can definitely implement MQ listener of queued processing requests. What I did was much simpler. The company doesn't have the resources to maintain a cluster, so whe did an elastic configuration that will expand accoding to cpu consumption and requests volume. So far, the application has processed 10 TB of data, from 15 customers, and at max request/processing peak, 3 elastic instances have fired up.
A file listener is an interesting idea. You can listen to a directory and drop a processing request to a queue or inmediately to the BatchRuntime. It will depend on how you want to scale it, your needed response time, the available resources, etc.
Feel free to ask me anything.
Regards.
EDIT: forgot to mention. I don't really recommend using the Application client unless you've already got something deployed on your organization. The recent security constraints and java SE updates mechanism has made a real hassle to maintain those kind of deployments. Think web.
I would approach it this way.
My hammers for this use case would be the Java Watch Service, a Servlet, a JMS queue, and the Batch service.
First, the Watch Service is the Java 7 go to place to handle the file system monitoring.
I would write a Watch Service implementation, and I would run it on a thread.
Where does the thread run you ask?
Officially, you should probably be using JCA for this. But, JCA is flat out a pain to work with, underutilized, thus under documented. There are solid examples, but it's simply not a common technology in the Java EE stack.
Another place is an asynchronous Session Bean invocation. There's nothing that suggests these can not be long lived invocations. You could stand up a #Singleton Session Bean, with #Startup, call the async method from a #PostConstruct method, and let it go. Then, in #PreDestroy signal the long running method to stop, so it can cleanly shut down. This should all be to spec, portable, and according to Hoyle.
The third place is to you a ServletContextListener, which is the pre-Java EE 6 go to place for tying code in to the life cycle of the application. Here, you would create the thread yourself in the contextInitialized method, and then tear it down in the contextDestroyed method.
Creating threads here is "less defined", but I've done it for years and never had a problem.
Now that you have your service running, the service (IMHO), will do two things.
1) It'll sense when a new file has arrived in the directory, and when it does, it will MOVE (mv, rename) the file to a parallel "processing" directory. The reason is that this tells you that a file has moved from incoming to processing, that the file is a work in progress. It's obvious from a directory listing, regardless of what the backend thinks it's doing. Remember, the system can go down mid way through a file.
2) Once moved, post the file name, and any other meta data on to a JMS queue and have an MDB do tool up the batch job.
Why add the JMS queue? It brings a couple of features to the party. First, it's great way to get stuff "from outside" the happy transactional context that EJB likes, to inside one. Second, it's transactional. You can, depending on your ETL use case, have the MDB directly process the job. And by doing so, you simply do not acknowledge the message from the queue until the processing is done (and the file is deleted or moved from the "processing" directory). In an ideal world, the message queue has messages matching the files in the processing directory. When the processing is done, the method returns, the message fetch "commits", and you're done. If the system crashes, this will restart from the beginning automatically (since the message is still on the queue and was never removed).
The MDB, by configuring it's instances, can gate the number of simultaneous jobs also. Configure 10 instances, only 10 files can be processed at the same time. But this can be a little too simple, too coarse. There's no priority for example (first come first serve). But it might work for you.
But either way, the MDB is a great gateway into system, since each one starts with it's own little bit of transactional context. Unlike the long running servlet thread or the long running async thread. The servlet thread has a questionable (if any) transactional status, the long running thread inherits it's state from the #Startup method, and retains it for it's life time. The MDB gets a new one each time. Much of this can be shenaniganed away calling methods with new transactions.
But I like the demarcation of the MDB. Even if it's entire task is to create the Batch entry for a file name, the MDB is a good gatekeeper.
And that's pretty much it.
The key parts are being a good citizen and tearing down your thread properly tied to the lifecycle of the application, understanding your transactional state at the various components, and understanding how all the moving parts fit together.
If you use the #Startup technique, make sure you invoke your async method via injecting another instance of your session bean. Otherwise the invocation will be a local call, and not asynchronous. You'll stare at it wondering why your server is hanging and not starting up. All of the EJB annotations only work when invoked through an injected or looked up proxy.
Have fun, share and enjoy.
Addenda to the question:
There's really no value to having an external process manage the watch service. One tied to the lifecycle of the server is easier to maintain. Two things come to mind. If the server is down, file will simply stack up in the file system until the server is started again, so you don't lose data. If you have an external service, then you either have it sending messages to a dead server, or you have to stage and manage the JMS server separate from the app server. In that case you now have 3 processes to manage: Watch service, JMS Server, and app server, rather than just the app server.
I agree with the other poster that should you decide to go with an external service anyway, a simple Java SE app posting simple messages to a JAX-RS REST service on the server, or even a trivial Servlet is much, MUCH more easy to maintain, stage and deploy than an app client. If you do it that way, you could write the watch service in something completely different.
But since the server (ostensibly) has direct access to the file system with the file, there's really no motivation to break this service outside of the container. Put the whole kit in to an EAR and have at it. Just flat easier management.
So, I have a web client and an EJB timer, deployed seperately.
The workflow is as follows:
1) User accesses client.
2) User requests an action to take place which is known to be long-running, so we write the request to run this process in a database table.
3) TimerOne is checking this table every few seconds to see if there are any waiting tasks, so it finds the user's request and runs the task.
My problem is that in some environments in which our application is run, we are taking advantage of server clustering. When we do this, both the client and the EJB timer are deployed to each server in the cluster.
It is okay for the client to be deployed to multiple servers, as it helps with workload; however, having the timer run on multiple servers is an issue. When the user requests for a long-running task to be run, both timers grab the task at the same time from the database and start running it. As the long-running jobs usually write to the database, this scenario leads to collisions, among other issues.
My goal is to be able to deploy my EJB timer to both servers, but for there to be some state maintained across the cluster which can be used by the timers to decide whether they should pick up the task or if one of the other instances has already picked it up.
I tried using the database for this and tried file storage, but these are either too slow, or I could not come up with a bullet-proof workflow for synchronization.
Does anyone know of a good way to handle this problem? Is it even possible?
The solution should be able to run on a clustered WebLogic domain, a non-clustered WebLogic domain, a clustered Glassfish domain, and a non-clustered Glassfish domain.
I am open to changing the way this is done, if there is another, more elegent solution.
Thanks for any ideas!
Yes this is possible with clustered timers or a Weblogic Singleton Service (and has been asked a number of times here already). See the following:
Clustered timers:
https://blogs.oracle.com/muraliveligeti/entry/ejb_timer_ejb
http://shaoxiongyang.blogspot.com/2010/10/how-to-use-ejb-3-timer-in-weblogic-10.html
http://java.sys-con.com/node/43944
Singleton Services:
https://blogs.oracle.com/jamesbayer/entry/a_simple_job_scheduler_example
http://developsimpler.blogspot.com/2012/03/weblogic-clusters-and-singleton-service.html
I am open to changing the way this is done, if there is another, more elegent solution.
I know that your question is about a EJB Timer, but take in mind the following:
In my opinion, you have a requirement that need the advantage of asynchronous processing.
In earlier Java EE versions, one of the alternatives to achieve this kind of requirement was to use JMS which allows you to send a message that is processed later for a business layer component. Other possibility was the one that you have described, that required the use of EJB Timer. I think both cases were a workaround that filled a gap in the EE specification.
Since Java EE 6, you can define asynchronous services which allows you make asynchronous calls, avoiding to use features were thought for other purposes.
I am creating a Java service which will run within a web servlet container (probably Tomcat). One portion of the server will run on its own and will not be initiated by HTTP. I know that when an HTTP call causes an exception, the web container can call it again.
I want to be sure that the part of the server which runs continuously will continue to run, even if it fails. I will handle whichever failures I can manually, but if it all fails I want something to restart it all. Are there any tools that can accomplish this easily? I am already using Spring and Tomcat, so if those can provide it, that is ideal. If not, then how about a good design pattern?
Edit: To clarify, I have a web service which will run in Tomcat. I want to run a separate thread within that service and set it up such that when the thread ends or an un-handled exception occurs, Tomcat (or something else) detects the failure and restarts the web service. I know that typically web containers have threads start from some external call and thus handle failures from those threads. What I want is something which handles a background worker thread.
Not quite clear on the design you have in mind, but it seems to me you need some sort of health check.
You can implement such a mechanism in many ways e.g. open a socket from this process that runs all time and periodically send a message.
If there is no reply then the process failed.
You could restart tomcat or implement a mechanism to restart that process.
Can not tell you more details since you do not specify much on what you are trying to do.
UPDATE:
I think that you should use JMX. It is offered by Spring and Tomcat that you already use.
Just make the process you want to monitor a managed resource and another module can check if it is alive.
If you are running inside a Servlet then as per J2EE spec, you cannot restart the container but, you can use ScheduledExecutorService to continuously monitor that your service is running and if not, then re-start it.
EDIT. More details below
You can call isTerminated() to check if the service still running and add more tasks to it, if the queue is empty.
I may be misunderstanding your problem here, but you might be over-thinking it.
There's nothing stopping you from running multiple Tomcat instances on a single machine. You could then have Server A connect to Server B to pull down information (via a web service of your choosing). This would alleviate the need for an outage on server A to cause an outage on server B (which is what I'm assuming you're trying to avoid).
This is a common way to isolate production environments simply by binding to a separate port. If Tomcat doesn't fit the bill for the service you can always run the application as a service on [insert operating system of choice] and connect to it via a proprietary protocol. Your operating system can handle restarts in that case. Typically I think the multiple Tomcat containers is the easiest approach as it is simple to install and relatively easy to set up.
Good luck, it seems like a fun system administration problem. You also might be interested in checking out Quartz job scheduling as that might fit the bill for an intermittent service.
edit: a little more detail might provide some more detailed answers.
See this post. It's a simple tomcat-watchdog shell script.
I have a local web app that is installed on a desktop PC, and it needs to regularly sync with a remote server through web services.
I have a "transactions" table that stores transactions that have been processed locally and need to be sent to the remote server, and this table also contains transactions that have retrieved from the remote server (that have been processed remotely) and need to be peformed locally (they have been retrieved using a web service call)... The transactions are performed in time order to ensure they are processed in the right order.
An example of the type of transactions are "loans" and "returns" of items from a store, for example a video rental store. For example something may have been loaned locally and returned remotely or vice versa, or any sequence of loan/return events.
There is also other information that is retrieved from the remote server to update the local records.
When the user performs the tasks locally, I update the local db in real time and add the transaction to the table for background processing with the remote server.
What is the best approach for processing the background tasks. I have tried using a Thread that is created in a HTTPSessionListener, and using interrupt() when the session is removed, but I don't think that this is the safest approach. I have also tried using a session attribute as a locking mechanisim, but this also isn't the best approach.
I was also wondering how you know when a thread has completed it's run, as to avoid lunching another thread at the same time. Or whether a thread has ditched before completing.
I have come accross another suggestion, using the Quartz scheduler, I haven't read up on this approach in detail yet. I am going to puchase a copy of Java Concurrency in Practice, but I wanted some help with ideas for the best approach before I get stuck into it.
BTW I'm not using a web app framework.
Thanks.
Safest would be to create an applicationwide threadpool which is managed by the container. How to do that depends on the container used. If your container doesn't support it (e.g. Tomcat) or you want to be container-independent, then the basic approach would be to implement ServletContextListener, create the threadpool with help of Java 1.5 provided ExecutorService API on startup and kill the threadpool on shutdown. If you aren't on Java 1.5 yet or want more abstraction, then you can also use Spring's TaskExecutor
There was ever a Java EE proposal about concurrency utilities, but it has not yet made it into Java EE 6.
Related questions:
What is the recommend way of spawning threads from a servlet?
Background timer task in a JSP web application
Its better to go with Quartz Scheduling framework, because it has most of the features related to scheduling. It has facility to store jobs in Database, Concurrency handling,etc..
Please try this solution
Create a table,which stores some flag like 'Y' or 'N' mapped to some identifiable field with default value as 'N'
Schedule a job for each return while giving loand it self,which executes if flag is 'Y'
On returning change the flag to 'N',which then fires the process which you wanted to do