Here's what i need.. I have a UI where a user has the capability to upload a file and extract a report based on the inputted(uploaded) data. Since there is a huge data to be extracted, once the user uploads the data i would like to come out of the servlet control so that user doesn't have to wait in the same page and that the control to be passed on to a java stand alone program there by making it possible for the user to work on something else. So once the control goes on to the java standalone,it would invoke back-end sps and build an extract out of it and place it in a file path on the server.
The user how-over has a capability from UI to check if the extract is ready for them to download.
So the question here is, what is the best practice or possibility in achieving the same? Please let me know your valuable comments.
Thanks!
If you're running in a Java EE environment I would suggest having the servlet dispatch the task to a JMS queue and use a message driven bean to do the (async) processing.
As others suggest, it would be fairly trivial to have the upload servlet redirect the user to some ajax-enabled page that polls the backend for job completion.
If you're not in an EE environment, you could create a standalone (thread pooled) application to consume from the queue and provide signalling eg. through the database (I assume the result goes in a DB anyway). The Spring framework provides very capable and extensive facilities for binding it all together.
But really, there are several free/open source EE containers available, from light weight up to enterprise, so there's no need to build the necessary stuff yourself.
Cheers,
Its very easy.
Have one thread in your servlet class.
Run the thread (Thread will extract the data etc).
After running the thread redirect user to a page where you have auto-refresh or something to show how much extraction is done.(You mentioned that you have a way to find it)
If you can't use message driven beans, you could have your servlet upload the data to a location on the filesystem and record a row in a DB table to say there's a job to be processed.
Then you have your standalone program polling for jobs, processing the data and updating the DB row on completion (including reasons for failure etc.).
Finally, you can poll the status of the job from the UI using an ajax request.
Allows the user to build up a queue of data jobs to be processed while they're doing something else.
Related
I apologize in advance if this is a bad question.
I'm new to backend development and I'm trying to build an instant messaging service with GAE using java servlets.
And I assume the process for sending a message will be like this:
1. Client send JSON file to servlet.
2. Servlet parses the JSON file and archives the message to the database.
So my question is:
what's going to happen if the next user attempts to send another message while the servlet is in the middle of the process of saving the previous message to the database?
Because the arrival of user requests are not synchronized with the servlet cycle, will the new request just get lost?
Is there going to be some mechanism that queues the request or it's something that I'll have to implement myself?
I think I'm really confused about how the asynchronous request between different functions in a distributed system works.
And, if there any readings that you would recommend for backend design pattern? or just a general introduction?
Thanks a lot!
Please read the official tutorial on the subject that talks in depth about the java web technologies , web containers and servlets:
http://docs.oracle.com/javaee/6/tutorial/doc/bnafd.html
But to answer your questions :
When another HTTP request comes in , a new thread will be created by
the web container and will run your servlet concurrently.
The new request will be processed concurrently
The answer depends on your specific problem , performance and SLA requirements. The simplest solution would be to parse and write each request to the database. If you are dealing with a very large number of simultaneous requests coming in , i'd suggest starting a whole new discussion on the subject.
You need to know exactly what the 'Thread' is? When another request sent to Servlet. The container like tomcat will assign another thread for this request. Every thread is independent from another.
Server requests will run in parallel and your code might access/edit the same data concurrently. You should use Datastore transactions to prevent data corruption.
No, requests are independent and they run in parallel.
You could use Task Queues in your code to make updates run sequentially, but I'd advise highly against it: first Task Queue will double your requests, second it will force a distributed parallel system to run sequentially, basically negating the whole purpose of AppEngine.
Parallel processing are essential in server programming - they enable servers to process high amount of requests. You should write code that takes this into account - use datastore transactions to prevent possible data corruption in those cases.
in a servlet lifecycle the init() and destroy() methods are called only once - but the service() will be called each time a new request comes and hit the application and a new instance of the servlet will be shared with the request through a different thread . Therefore one of the most basic rules for creating a servlet is not to create global variable in a servlet class.
Your variable is readable/writeable by any other class. You have no control to ensure that they all do sensible things with it. One of them could overwrite it/incorrectly increment it, etc
The is one instance of a servlet, per JVM. So may threads may try to access it concurrently. Because it is global, and you are not providing any synchronization/access control, it will not be thread-safe. Also, if you ever run the servlet in some kind of cluster with different JVMs, then the variable will not be shared between them and you will have multiple loginAttempt variables.
Context
I'm in the process of drawing a solution to migrate a huge PL/SQL system to Java. The initial step is migrating some ETL jobs that:
Reads CSV, XML, (XLS, which is a new requirement) and Positional files from several ftp / sftp sources
Process the files according to rules stored in the database and write the results to a database table.
Currently this is done by several store procedures and Jobs.
My company is open to suggestions (if it can run in GlassFish 4 and share its logging and connection pool mechanisms, as well as the admin console, it is a plus).
I've done a little bit of research and the following options caught my eye:
Java EE 7 Batch Processing, sounds simple and particularly well fitted for GlassFish 4.
Spring Batch somewhat more mature and very similar to the Java EE 7 standard (which was probably based on it).
Apache Camel, sounds powerful and would spare us from a lot of fiddling with libraries such a Apache POI, but it also looks somewhat complex. Also I'm not sure if it is the best fit for the job (ETL over huge files).
Cook everything by myself. I could create a Application Client to run a Quartz / Spring Scheduler or even EJB Timers
While I'm still open to suggestions (recommendations would be nice), the best fit so far seems to be Java EE 7 Batch Processing.
One more thing, the infrastructure team have a solution to move files from every ftp source to a local directory, so FTP is really not an issue.
Problem
I've read several tutorials about Java EE Batch Processing and, in all of them, some kind of Servlet or EJB Timer is responsible for starting the Jobs:
JobOperator jobOperator = BatchRuntime.getJobOperator();
jobOperator.start("job", properties);
I could easily upload a web / ejb project and keep pooling for changes. But I was thinking about a push model:
Application client console application
Main class watches directories for new files
When there is a new file it would start a new job.
My doubts are:
Is this strategy possible/ advisable?
Will I need a JMS queue or some kind of producer / consumer strategy in the middle or should I just call jobOperator.start for every file and trust the batch processing layer to manage the application resources? In other words, if a thousand files are delivery to my folder at once and I call jobOperator.start a thousand times, will GlassFish 4 do some kind of smart enqueuing or should I create some kind of Gate so that no more than n jobs run simultaneously?
I've already implemented a project with Batch Processing in Wildfly (Jboss AS). I'm not familiar with configuration details on Glassfish (not using it anymore because the've dropped enterprise support), however I can give you some insights and guidelines according to my experience. Also, please note that Spring and the Batch spec. on EE 7 are quite similar, and your decision to use either technology must depend on "what else" you want to achieve with your application besides the batching. Do you want an easily maintained web interface? Do you want to depelop a REST api?, etc.
The ETL jobs you're describing fit pefeclty with the steps and chunks model in the EE 7 spec, so If you've already tried to develop some tests, you may have noticed that you still need to code the file readers and mappers for each file specification. Your reading sources are quite standard, and you will easily find a library to read/stream them and process their data.
The project I've implemented is quite simple. Customers uplodad files that need to be processed in order to feed a data warehouse. This service is on the "cloud". Files have a defined spec and must be in CSV format. Most processing results are dimentional "Upserts" and fact "erasing prior inserting". The user has a Web interface on which files and batch processing metadata must be shown (processing state, dates, rejected items, etc.). Because it is a cloud service, the files must not reside locally on each server (using S3).
So the first thing to design are the chunk steps. I didn't want to have an implementation for each file spec., So what I did is to design a "fit all cases" implemetation that process files according to the metadata contained in them and also the job configuration itself. This is the easy part. The second thing to think about is the processing and metadata administration. Here, I developed a REST api and a Web interface that uses it. After all this, Will it scale? Wilfly has thread configuration parameters for the Batch Processing, and you can increase or decrease the thread availability for the JobOperator. Jobs are not submitted if there are not enough threads available. So what happends to those requests? Well, they can reside on memory, a backed up stateful session can be developed, you can definitely implement MQ listener of queued processing requests. What I did was much simpler. The company doesn't have the resources to maintain a cluster, so whe did an elastic configuration that will expand accoding to cpu consumption and requests volume. So far, the application has processed 10 TB of data, from 15 customers, and at max request/processing peak, 3 elastic instances have fired up.
A file listener is an interesting idea. You can listen to a directory and drop a processing request to a queue or inmediately to the BatchRuntime. It will depend on how you want to scale it, your needed response time, the available resources, etc.
Feel free to ask me anything.
Regards.
EDIT: forgot to mention. I don't really recommend using the Application client unless you've already got something deployed on your organization. The recent security constraints and java SE updates mechanism has made a real hassle to maintain those kind of deployments. Think web.
I would approach it this way.
My hammers for this use case would be the Java Watch Service, a Servlet, a JMS queue, and the Batch service.
First, the Watch Service is the Java 7 go to place to handle the file system monitoring.
I would write a Watch Service implementation, and I would run it on a thread.
Where does the thread run you ask?
Officially, you should probably be using JCA for this. But, JCA is flat out a pain to work with, underutilized, thus under documented. There are solid examples, but it's simply not a common technology in the Java EE stack.
Another place is an asynchronous Session Bean invocation. There's nothing that suggests these can not be long lived invocations. You could stand up a #Singleton Session Bean, with #Startup, call the async method from a #PostConstruct method, and let it go. Then, in #PreDestroy signal the long running method to stop, so it can cleanly shut down. This should all be to spec, portable, and according to Hoyle.
The third place is to you a ServletContextListener, which is the pre-Java EE 6 go to place for tying code in to the life cycle of the application. Here, you would create the thread yourself in the contextInitialized method, and then tear it down in the contextDestroyed method.
Creating threads here is "less defined", but I've done it for years and never had a problem.
Now that you have your service running, the service (IMHO), will do two things.
1) It'll sense when a new file has arrived in the directory, and when it does, it will MOVE (mv, rename) the file to a parallel "processing" directory. The reason is that this tells you that a file has moved from incoming to processing, that the file is a work in progress. It's obvious from a directory listing, regardless of what the backend thinks it's doing. Remember, the system can go down mid way through a file.
2) Once moved, post the file name, and any other meta data on to a JMS queue and have an MDB do tool up the batch job.
Why add the JMS queue? It brings a couple of features to the party. First, it's great way to get stuff "from outside" the happy transactional context that EJB likes, to inside one. Second, it's transactional. You can, depending on your ETL use case, have the MDB directly process the job. And by doing so, you simply do not acknowledge the message from the queue until the processing is done (and the file is deleted or moved from the "processing" directory). In an ideal world, the message queue has messages matching the files in the processing directory. When the processing is done, the method returns, the message fetch "commits", and you're done. If the system crashes, this will restart from the beginning automatically (since the message is still on the queue and was never removed).
The MDB, by configuring it's instances, can gate the number of simultaneous jobs also. Configure 10 instances, only 10 files can be processed at the same time. But this can be a little too simple, too coarse. There's no priority for example (first come first serve). But it might work for you.
But either way, the MDB is a great gateway into system, since each one starts with it's own little bit of transactional context. Unlike the long running servlet thread or the long running async thread. The servlet thread has a questionable (if any) transactional status, the long running thread inherits it's state from the #Startup method, and retains it for it's life time. The MDB gets a new one each time. Much of this can be shenaniganed away calling methods with new transactions.
But I like the demarcation of the MDB. Even if it's entire task is to create the Batch entry for a file name, the MDB is a good gatekeeper.
And that's pretty much it.
The key parts are being a good citizen and tearing down your thread properly tied to the lifecycle of the application, understanding your transactional state at the various components, and understanding how all the moving parts fit together.
If you use the #Startup technique, make sure you invoke your async method via injecting another instance of your session bean. Otherwise the invocation will be a local call, and not asynchronous. You'll stare at it wondering why your server is hanging and not starting up. All of the EJB annotations only work when invoked through an injected or looked up proxy.
Have fun, share and enjoy.
Addenda to the question:
There's really no value to having an external process manage the watch service. One tied to the lifecycle of the server is easier to maintain. Two things come to mind. If the server is down, file will simply stack up in the file system until the server is started again, so you don't lose data. If you have an external service, then you either have it sending messages to a dead server, or you have to stage and manage the JMS server separate from the app server. In that case you now have 3 processes to manage: Watch service, JMS Server, and app server, rather than just the app server.
I agree with the other poster that should you decide to go with an external service anyway, a simple Java SE app posting simple messages to a JAX-RS REST service on the server, or even a trivial Servlet is much, MUCH more easy to maintain, stage and deploy than an app client. If you do it that way, you could write the watch service in something completely different.
But since the server (ostensibly) has direct access to the file system with the file, there's really no motivation to break this service outside of the container. Put the whole kit in to an EAR and have at it. Just flat easier management.
I have a webapp with an architecture I'm not thrilled with. In particular, I have a servlet that handles a very large file upload (via commons-fileupload), then processes the file, passing it to a service/repository layer.
What has been suggested to me is that I simply have my servlet upload the file, and a service on the backend do the processing. I like the idea, but I have no idea to go about it. I do not know JMS.
Other details:
- App is a GWT app split into the recommended client/server/shared subpackages, using an MVP architecture.
- Currently, I am only running in GWT hosted mode, but am planning to move to Tomcat in the very near future.
I'm perfectly willing to learn whatever I need to in order to get this working (in fact, that's the point of writing the app). I'm not expecting anyone to write code for me, but can someone point me in the right direction to get started?
There are many options for this scenario, but the simplest may be just copying the uploaded file to a known location on the file system, and have a background daemon monitor the location and process when it finds it.
#Jason, there are many ways to solve your problem.
i) Have dump you file data into Database with column type BLOB. and have a DB polling thread(after a particular time period) polls table for newly inserted file .
ii) Have dump file into file system and have a file montioring process.
Benefit of i) over ii) is that DB is centralized and fast resource where as file systems are genrally slow and non-centalized in nature.
So basically servlet would dump either to DB or file system. Now about who will process that dumped file:- a) It could be either montioring process as discussed above or b) you can use JMS which is asynchronous in nature what it means servlet would put a trigger event in queue which will asynchronously trigger new processing thread.
Well don't introduce JMS in your system unnecessarily if you are ok with monitoring process.
This sounds interesting and familiar to me :). We do it in the similar way.
We have our four projects, all four projects includes file upload and file processing (Image/Video/PDF/Docs) etc. So we created a single project to handle all file processing, it is something like below:
All four projects and File processor use Amazon S3/Our File Storage for file storage so file storage is shared among all five projects.
We make request to File Processor providing details in XML via http request which include file-path on S3/Stoarge, aws authentication details, file conversion/processing parameters. File Processor does processing and puts processed files on S3/Storage, constructs XML with processed files details and sends XML via response.
We use Spring Frameowrk and Tomcat.
Since this is foremost a learning exercise, you need to pick an easy to use JMS provider. This discussion suggested FFMQ just one year ago.
Since you are starting with a simple processor, you can keep it simple and use a JMS Queue.
In the simplest form, each message send by the servlet has to correspond to a single job. You can either put the entire payload of the upload in the message, or just send a filename as reference to the content in the message. These are details you can refactor later.
On the processor side, if you are using Java EE, you can use a MessageBean. If you are not, then I would suggest a 3 JVM solution -- one each for Tomcat, the JMS server, and the message processor. This article includes the basics of a message consuming client.
I have a local web app that is installed on a desktop PC, and it needs to regularly sync with a remote server through web services.
I have a "transactions" table that stores transactions that have been processed locally and need to be sent to the remote server, and this table also contains transactions that have retrieved from the remote server (that have been processed remotely) and need to be peformed locally (they have been retrieved using a web service call)... The transactions are performed in time order to ensure they are processed in the right order.
An example of the type of transactions are "loans" and "returns" of items from a store, for example a video rental store. For example something may have been loaned locally and returned remotely or vice versa, or any sequence of loan/return events.
There is also other information that is retrieved from the remote server to update the local records.
When the user performs the tasks locally, I update the local db in real time and add the transaction to the table for background processing with the remote server.
What is the best approach for processing the background tasks. I have tried using a Thread that is created in a HTTPSessionListener, and using interrupt() when the session is removed, but I don't think that this is the safest approach. I have also tried using a session attribute as a locking mechanisim, but this also isn't the best approach.
I was also wondering how you know when a thread has completed it's run, as to avoid lunching another thread at the same time. Or whether a thread has ditched before completing.
I have come accross another suggestion, using the Quartz scheduler, I haven't read up on this approach in detail yet. I am going to puchase a copy of Java Concurrency in Practice, but I wanted some help with ideas for the best approach before I get stuck into it.
BTW I'm not using a web app framework.
Thanks.
Safest would be to create an applicationwide threadpool which is managed by the container. How to do that depends on the container used. If your container doesn't support it (e.g. Tomcat) or you want to be container-independent, then the basic approach would be to implement ServletContextListener, create the threadpool with help of Java 1.5 provided ExecutorService API on startup and kill the threadpool on shutdown. If you aren't on Java 1.5 yet or want more abstraction, then you can also use Spring's TaskExecutor
There was ever a Java EE proposal about concurrency utilities, but it has not yet made it into Java EE 6.
Related questions:
What is the recommend way of spawning threads from a servlet?
Background timer task in a JSP web application
Its better to go with Quartz Scheduling framework, because it has most of the features related to scheduling. It has facility to store jobs in Database, Concurrency handling,etc..
Please try this solution
Create a table,which stores some flag like 'Y' or 'N' mapped to some identifiable field with default value as 'N'
Schedule a job for each return while giving loand it self,which executes if flag is 'Y'
On returning change the flag to 'N',which then fires the process which you wanted to do
I currently have a tomcat container -- servlet running on it listening for requests. I need the result of an HTTP request to be a submission to a job queue which will then be processed asynchronously. I want each "job" to be persisted in a row in a DB for tracking and for recovery in case of failure. I've been doing a lot of reading. Here are my options (note I have to use open-source stuff for everything).
1) JMS -- use ActiveMQ (but who is the consumer of the job in this case another servlet?)
2) Have my request create a row in the DB. Have a seperate servlet inside my Tomcat container that always runs -- it Uses Quartz Scheduler or utilities provided in java.util.concurrent to continously process the rows as jobs (uses thread pooling).
I am leaning towards the latter because looking at the JMS documentation gives me a headache and while I know its a more robust solution I need to implement this relatively quickly. I'm not anticipating huge amounts of load in the early days of deploying this server in any case.
A lot of people say Spring might be good for either 1 or 2. However I've never used Spring and I wouldn't even know how to start using it to solve this problem. Any pointers on how to dive in without having to re-write my entire project would be useful.
Otherwise if you could weigh in on option 1 or 2 that would also be useful.
Clarification: The asynchronous process would be to screen scrape a third-party web site, and send a message notification to the original requester. The third-party web site is a bit flaky and slow and thats why it will be handled as an asynchronous process (several retry attempts built in). I will also be pulling files from that site and storing them in S3.
Your Quartz Job doesn't need to be a Servlet! You can persist incoming Jobs in the DB and have Quartz started when your main Servlet starts up. The Quartz Job can be a simple POJO and check the DB for any jobs periodically.
However, I would suggest to take a look at Spring. It's not hard to learn and easy to setup within Tomcat. You can find a lot of good information in the Spring reference documentation. It has Quartz integration, which is much easier than doing it manually.
A suitable solution which will not require you to do a lot of design and programming is to create the object you will need later in the servlet, and serialize it to a byte array. Then put that in a BLOB field in the database and be done with it.
Then your processing thread can just read the contents, deserialize it and work with the ressurrected object.
But, you may get better answers by describing what you need your system to actually DO :)