I'd like for incoming Java servlet web requests to invoke RabbitMQ using the RPC approach as described here.
However, I'm not sure how to properly reuse callback queues between requests, as per the RabbitMQ tutorial linked above creating a new callback queue per every request is inefficient (RabbitMQ may not cope even if using the Queue TTL feature).
There would generally be only 1-2 RPC calls per every servlet request, but obviously a lot of servlet requests per second.
I don't think I can share the callback queues between threads, so I'd want at least one per each web worker thread.
My first idea was to store the callback queue in a ThreadLocal, but that can lead to memory leaks.
My second idea was to store them in a session, but I am not sure they will serialize properly and my sessions are currently not replicated/shared between web servers, so it is IMHO not a good solution.
My infrastructure is Tomcat / Guice / Stripes Framework.
Any ideas what the most robust/simple solution is?
Am I missing something in this whole approach, and thus over-complicating things?
Note 1- This question relates to the overall business case described here - see option 1.
Note 2 - There is a seemingly related question How to setup RabbitMQ RPC in a web context, but it is mostly concerned with proper shutdown of threads created by the RabbitMQ client.
Related
I apologize in advance if this is a bad question.
I'm new to backend development and I'm trying to build an instant messaging service with GAE using java servlets.
And I assume the process for sending a message will be like this:
1. Client send JSON file to servlet.
2. Servlet parses the JSON file and archives the message to the database.
So my question is:
what's going to happen if the next user attempts to send another message while the servlet is in the middle of the process of saving the previous message to the database?
Because the arrival of user requests are not synchronized with the servlet cycle, will the new request just get lost?
Is there going to be some mechanism that queues the request or it's something that I'll have to implement myself?
I think I'm really confused about how the asynchronous request between different functions in a distributed system works.
And, if there any readings that you would recommend for backend design pattern? or just a general introduction?
Thanks a lot!
Please read the official tutorial on the subject that talks in depth about the java web technologies , web containers and servlets:
http://docs.oracle.com/javaee/6/tutorial/doc/bnafd.html
But to answer your questions :
When another HTTP request comes in , a new thread will be created by
the web container and will run your servlet concurrently.
The new request will be processed concurrently
The answer depends on your specific problem , performance and SLA requirements. The simplest solution would be to parse and write each request to the database. If you are dealing with a very large number of simultaneous requests coming in , i'd suggest starting a whole new discussion on the subject.
You need to know exactly what the 'Thread' is? When another request sent to Servlet. The container like tomcat will assign another thread for this request. Every thread is independent from another.
Server requests will run in parallel and your code might access/edit the same data concurrently. You should use Datastore transactions to prevent data corruption.
No, requests are independent and they run in parallel.
You could use Task Queues in your code to make updates run sequentially, but I'd advise highly against it: first Task Queue will double your requests, second it will force a distributed parallel system to run sequentially, basically negating the whole purpose of AppEngine.
Parallel processing are essential in server programming - they enable servers to process high amount of requests. You should write code that takes this into account - use datastore transactions to prevent possible data corruption in those cases.
in a servlet lifecycle the init() and destroy() methods are called only once - but the service() will be called each time a new request comes and hit the application and a new instance of the servlet will be shared with the request through a different thread . Therefore one of the most basic rules for creating a servlet is not to create global variable in a servlet class.
Your variable is readable/writeable by any other class. You have no control to ensure that they all do sensible things with it. One of them could overwrite it/incorrectly increment it, etc
The is one instance of a servlet, per JVM. So may threads may try to access it concurrently. Because it is global, and you are not providing any synchronization/access control, it will not be thread-safe. Also, if you ever run the servlet in some kind of cluster with different JVMs, then the variable will not be shared between them and you will have multiple loginAttempt variables.
The scope/context of this question:
I am to develop a Java/Java EE based distributed server-side application that is scalable (scale-up, rather than scale-out).
My application comprises of servlets utilizing multiple instances of distributed back-end services for processing client requests. If I need to achieve more throughput, I want to be able to just add more instances of these distributed services (JVMs on the same or another machine) and (expect to) see an increase in throughput.
To achieve this, I was thinking of a loosely-coupled asynchronous system.
I thought I would use Async Servlets (servlet 3.0) and an application-managed thread-pool that places client requests on JMS queues, which would be picked by one of the distributed service instances and processed. The responses can be relayed back to the client using JMS, from the service instances to a response-thread in the servlet container.
However, an asynchronous system seems to be (obviously) more complex than a synchronous one (ex: error-handling and error-relaying to the client, request tracking etc). I am also worried about the future maintainability of the design/code.
So, a question arises Does it make sense to do this synchronously, while still remaining distributed, scalable and loosely-coupled ?
If the answer is yes, then pls also share possible ways of achieving this (while remaining 'constructive').
If I can do this well in a synchronous way, then it will simplify the entire system.
I dont want to add complexity to the system unnecessarily.
(Assuming it makes sense) One possible implementation I could think of is using RMI.
For ex: A service registry for the distributed service instances to register and have a load-balancer distribute the RMI calls across all the available instances. But it feels to be a old-generation solution. Are there any better options available ?
Edit:
Other details about the scope of this question:
The client-side is browser-based does not demand an asynchronous
server-side.
I dont need server-push.
At any time, I wont have more outstanding requests than max-worker-threads of the popular web servers (even Apache).
For the above reasons, the use-cases mentioned in a related question dont seem to apply to my scenario.
Loose coupling and distribution are independent of whether processing is synchronous or asynchronous.
With scalability, the matter is more complex. In a synchronous model, you will need one thread per pending request. If you need to scale to really high load (say, thousands of concurrent requests per server), an asynchronous model may scale better. To reap the benefit of that however, the entire processing, starting from the handling of incoming connections, needs to be done in an asynchronous way. There is little point to have a synchronous request processing thread delegate to a asynchronous thread pool, and blocking until that thread pool has computed the result - after all, the request thread could just as well have done the work himself.
If you need to return a response, I'd therefore go for synchronous request processing whenever scalabity permits (which it usually does).
Edit:
There are numerous ways to talk to the distributed backend servers. You might simply use EJB (which, if I recall correctly, uses RMI under the hood). Or, you might use webservices behind a load balancer.
Before we develop our custom solution, I'm looking for some kind of library, which provides:
Non-blocking queue of HTTP requests
with these attributes:
Persisting requests to avoid it's loss in case of:
network connectivity interruption
application quit, forced GC on background app
etc..
Possibility of putting out all these fields:
Address
Headers
POST data
So please, is there anything usable right know, what could save us whole day on developing this?
Right now we don't need any callbacks on completed request and neither saving result data, as there won't be such.
In my humble opinion, a good and straightforward solution would be to develop your own layer (which shouldn't be so complicated) using a sophisticated framework for connection handling, such as Netty https://netty.io/ , together with a sophisticated framework for asynchronous processing, such as Akka http://akka.io/
Let's first look inside Netty support for http at http://static.netty.io/3.5/guide/#architecture.8 :
4.3. HTTP Implementation
HTTP is definitely the most popular protocol in the Internet. There are already a number of HTTP implementations such as a Servlet container. Then why does Netty have HTTP on top of its core?
Netty's HTTP support is very different from the existing HTTP libraries. It gives you complete control over how HTTP messages are exchanged at a low level. Because it is basically the combination of an HTTP codec and HTTP message classes, there is no restriction such as an enforced thread model. That is, you can write your own HTTP client or server that works exactly the way you want. You have full control over everything that's in the HTTP specification, including the thread model, connection life cycle, and chunked encoding.
And now let's dig inside Akka. Akka is a framework which provides an excellent abstraction on the top of Java concurrent API, and it comes with API in Java or Scala.
It provides you a clear way to structure your application as a hierarchy of actors:
Actors communicate through message passing, using immutable message so that you have not to care about thread-safety
Actors messages are stored in message boxes, which can be durable
Actors are responsible for supervising their children
Actors can be run on one or more JVM and can communicate using a wide numbers of protocols
It provides a lightweight abstraction for asynchronous processing , Future, which is easier to use then Java Futures.
It provides other fancy stuff such as Event Bus, ZeroMQ adapter, Remoting support, Dataflow concurrency, Scheduler
Once you become familiar with the two frameworks, it turns out that what you need can easily be coded through them.
In fact, what you need is an http proxy coded in Netty, that upon a request receival sends immediately a message to an Akka Actor of type FSM (http://doc.akka.io/docs/akka/2.0.2/java/fsm.html) which using a durable mailbox (http://doc.akka.io/docs/akka/2.0.2/modules/durable-mailbox.html )
Here is a link to open-source library that was a Master Thesis of a student at Czech Technical University in Prague. It is very large and powerful library and mainly focuses on location. The good thing about it, though, is that it omitted the headers and other -ish that REST has.
It is the latest fork and hopefully it will give you at least inspiration for "own" solution.
how about those concurrent collections:
http://mcg.cs.tau.ac.il/projects/concurrent-data-structures
i hope that the license is ok .
You'll want to have a look to these to posts. (added at the end of the document)
Very basically an approach that works in a proficient way for me is to separate requests from the queue and the executor.
Requests are executed as Runnables or Callables. Inherit from them to create different kind of requests to your API or service. Set them up there adding headers and or body prior to to executing them.
Enqueue those requests in a queue (choose which fits better for you - I'd say LinkedBlockingQueue will make the job) linked to an executor from within a bound service and calling them from your activity or any other scope. If you don't need to get responses and callbacks you can avoid using Guava for listening to futures or create your own callbacks.
I'll stay tuned. If you need more depth I can post some specific pieces of code. There's the source of a basic example in the first link though.
http://ugiagonzalez.com/2012/08/03/using-runnables-queues-and-executors-to-perform-tasks-in-background-threads-in-android/
http://ugiagonzalez.com/2012/07/02/theres-life-after-asynctasks-in-android/
Update:
You can create another queue for those requests that were impossible to execute.
One approach that comes to my mind would be to add all your failed requests to the retry queue. The retry queue would be trying to re-run these tasks while the phone still thinks that there's any kind of internet connection available. In the request object you can set a max number of retrials and compare it to a currentRetry number increasing it in every retrial.
Mmm this might be interesting. I'll definitely think about including that in my library.
I have a web service, that takes an input xml message, transforms it, and then forwards it to another web service.
The application is deployed to two web logic app servers for performance, and resilience reasons.
I would like a single website monitoring page that allows two things
ability to stop/ start forwarding of messages
ability to monitor throughput of number of messages in the last hour etc. Number of different senders into the webservice etc.
I was wondering what the best way to implement this was.
My current idea is to have an in memory database (eg Debry or HSQL) replicating data to share the information between the two (or more) instances of my application that are running in different instances of the app server. I imagine I would have to setup some sort of master/ slave configuration.
I would love a link to an article that discusses how to solve this problem.
(Note, this is a simple spring application using spring MVC)
thanks,
David.
This sounds like a good match for Java Management Extensions (JMX)
JMX allows you to expose certain operations (eg: start/stop forwarding messages)
JMX allows you to monitor certain performance indicators (eg: moving average of messages processed)
Spring has good support for exposing beans as JMX MBeans. See here for more information.
Then you could use an open-source web-based JMX console, such as jManage
Hope this helps.
Sounds like you are looking for a Message Queue, some MDBs and a configurable design would let you do all these. Spring has support for JMS Queues if I'm not wrong
I think you are looking for a message queue. If you need additional monitoring, using a web service as the end point may not suffice - with regards to stop/start or forwarding of messages; monitoring http requests to web service is more cumbersome than tracking messages to a queue (even though you can do it).
If you are exposing this service to third party, then the web service will sit on top of the message queue and delegate to to it.
In my experience, RabbitMQ is a fine messaging queue service with a relatively simple learning curve.
I currently have a tomcat container -- servlet running on it listening for requests. I need the result of an HTTP request to be a submission to a job queue which will then be processed asynchronously. I want each "job" to be persisted in a row in a DB for tracking and for recovery in case of failure. I've been doing a lot of reading. Here are my options (note I have to use open-source stuff for everything).
1) JMS -- use ActiveMQ (but who is the consumer of the job in this case another servlet?)
2) Have my request create a row in the DB. Have a seperate servlet inside my Tomcat container that always runs -- it Uses Quartz Scheduler or utilities provided in java.util.concurrent to continously process the rows as jobs (uses thread pooling).
I am leaning towards the latter because looking at the JMS documentation gives me a headache and while I know its a more robust solution I need to implement this relatively quickly. I'm not anticipating huge amounts of load in the early days of deploying this server in any case.
A lot of people say Spring might be good for either 1 or 2. However I've never used Spring and I wouldn't even know how to start using it to solve this problem. Any pointers on how to dive in without having to re-write my entire project would be useful.
Otherwise if you could weigh in on option 1 or 2 that would also be useful.
Clarification: The asynchronous process would be to screen scrape a third-party web site, and send a message notification to the original requester. The third-party web site is a bit flaky and slow and thats why it will be handled as an asynchronous process (several retry attempts built in). I will also be pulling files from that site and storing them in S3.
Your Quartz Job doesn't need to be a Servlet! You can persist incoming Jobs in the DB and have Quartz started when your main Servlet starts up. The Quartz Job can be a simple POJO and check the DB for any jobs periodically.
However, I would suggest to take a look at Spring. It's not hard to learn and easy to setup within Tomcat. You can find a lot of good information in the Spring reference documentation. It has Quartz integration, which is much easier than doing it manually.
A suitable solution which will not require you to do a lot of design and programming is to create the object you will need later in the servlet, and serialize it to a byte array. Then put that in a BLOB field in the database and be done with it.
Then your processing thread can just read the contents, deserialize it and work with the ressurrected object.
But, you may get better answers by describing what you need your system to actually DO :)