The application we are building has a very simple concept: it receives incoming events from a Database and for each event it opens an interactive session with clients (in the event) by showing a menu. Based on client response, we move to the next state or take some concrete action (e.g. transferring funds).
Sessions are independent of one another. For example, suppose we get two events from the database saying clients A and B have reached a zero account balance state. In response to this event, we establish two connections to A and B show a menu which looks like the following:
Please select an option:
1. Get $5
2. Get $10
3. Ignore
For options 1 and 2, we ask for confirmation in the form of second menu.
Are you sure?
1. yes
2. no
In this case, we'll have two sessions. Client A might choose option 1 (1. Get $5), whereas Client B chooses option 3 [in the first menu]. In the case of Client A, we'll present the second menu (above) and if the response is 1. yes, we'll take some concrete action such as transferring funds and closing the session.
All client communication is done by a 3rd party system which takes JSON including client address, menu text and returns a response back to us. It takes care of actually maintaing the session on the wire, whereas we only need to do response correlation and dealing with session states.
We're expected to handle 50,000 of such sessions in parallel.
Earlier, we designed the system in Java using SEDA model. Having heard of Actors, we are willing to check them out and write a quick PoC project (Java/AKKA). My questions are:
Has anyone had experience in building such kind of an application? Is 50,000 simultaneous sessions too much for AKKA to handle? (Note, we are only waiting for the response. When the response comes, based on the answer, we jump to the next stage, so it should be possible).
Which architectural stye/paradigm which best suit this problem in AKKA? Are there any frameworks out there for this kind of problem?
This is actually a reasonably easy use case with Akka's clustering. 50K sessions represented as an Actor instance for each is not very high load. The reason to use clustering is only for fault tolerance.
The idea behind the architecture would be to have a web tier for handling RESTful requests that correspond to the sessions. These requests would be sent to the Akka cluster and routed to the appropriate session Actor by session ID, or a new one would be created. When a session is done, you stop the actor that is associated with it.
Note that the session actors should send themselves timeout messages via the scheduler. Upon completion of handling a new message, the actor should schedule itself a message via the ActorSystem scheduler for 15 minutes (or whatever your timeout is). When a new session message is received, that scheduled task should be cancelled, the new update handled, and then a new timeout scheduled. There is a plausible race condition here, in that a timeout message may be in your session actor's mailbox queue AFTER a session message, but if your timeout message includes a time of when it was scheduled (the 15 minutes ago), you can check that and ignore it and reschedule another (just as a safety mechanism to avoid a memory leak). If the time is greater than 15 minutes ago, then you stop the actor.
To see how the distribution of work to the session actors would be implemented, please see the "Distributed Workers with Akka and Java" template in Typesafe's Activator. You will have a fully running clustered Akka application that you can tailor to do the session management as I've described above. You can then export the project and work on it in Eclipse/IntelliJ/Sublime/TextMate/etc. To download Activator, see here.
Related
Our project consists of multiple microservices. These microservices form a boundary to which the entry point is not strictly defined meaning each of microservices can be requested and can request other services.
The situation we need to handle in this bounded microservice context is following:
client (other application) makes the request to perform some logic and change the data (PATCH),
request times out,
while request is being processed client fires the same request to repeat the operation,
operation successfully completes,
second request is being processed the same way and completes within it's time and client gets response.
Now what happened is that the same was processed two times because of first timeout.
We need to make sure the same request won't get processed and application will respond with former response and status code.
The subsequent request is identified by the same uuid.
Now, I understand it's the client that should do requesting more precisely or we should have a single request entry point in out micorservices bounded context, but in enterprise projects the team doesn't own the whole system therefore we are a bit constrained with the solutions we propose for the problem. with this in mind while trying to not reinvent the wheel this comes to my mind:
The microservices should utilize some kind of session sharing (spring-session?) with the ability to look up the request by it's id before it gets processed and in described case, when first is being processed and second arrives, wait for the completion of the 1st and respond to the second with data of the first that has timed out for a client.
What I am struggling with is imagining handling the asynchronicity of replying to the second one and how to listen for session state of the first request.
If spring-session would be used (for example with hazelcast) I'm lacking some kind of concrete session state handler which would get fired when request ends. Is there something like this to listen for?
No code written yet. It's an architectural thought experiment that I want to discuss.
If unsure of understanding, read second time please, then I'm happy to expand.
EDIT: first idea:
process would be as follows (with numbering on the image):
(1) first request fired
(3) processing started; (2) request timed out meanwhile;
(4) client repeats the same request; program knows it has received the same request before because it knows the req. id.
program checks the cache and the state of that request id 'pending' so it WAITS (async).
computed result of first request is saved into the cache - orange square
(5) program responds to the first request with the data that was meant to be for the first one
idea is that result checking and responding to the repeated request would be done in the filter chain so it won't actually hit the controller when the second request is asynchronously waiting for the operation triggered by the first request to be done (I see hazelcast has some events when rows are added/updated/evicted from the cache - dunno if it's working yet) and when complete just respond (somehow write to the HttpServletResponse). result would be saved into the cache in postHandling filter.
Thanks for insights.
I'd consider this more of a caching paradigm. Stick your request/responses into an external cache provider (REDIS or similar), indexed by uuid. Having a TTL will allow your responses to automatically get cleaned up for requests that are never coming back, and the high-speed implementation (o1) should allow this to scale nicely. It will also out-of-the-box give you an asynchronous model (not a stated goal, but always a nice option).
I have integration with video stream provider. The flow is following: user requests a stream url, next we on behalf of user, request it from stream provider and return to the user. Next, we should prolong the stream id (session) every 10 secs. To minimize interaction with the client and because of slow network, we want to do this session prolongation on behalf of the user. So let's say, user will trigger one request per 2-5 mins, at the same time server will trigger session prolongation requests every 10 secs.
The question is in possible design of such service. I have not found better solution other than just simply iterate over all available session keys periodically and call prolongation service.
But this approach has disadvantages when user count will be really big it could slow down processing. Also, it is hard to scale with such an approach.
Maybe you have ideas about how to overcome this? Or please propose a better solution
I would write the keep alive as a single self contained piece of code, that will call the keep alive every x number for seconds for y amount of time before ending itself, where x, y and the keep alive endpoint are startup parameters.
Each time the user triggers a request - kick one of these off in the background. How you package that is determined on your deployment environment and how you intend to manage scaling out (background thread, new process, server-less function, etc.).
You may need to maintain some state info in a cache for management purposes (don't start a new one if one is already running, hung process states, etc.).
I have a service, living on Server 1. Let's call it PDFService. PDFService takes documents and stitches them together in a single PDF.
However, PDFService only knows about document ids. It relies on Server 2 to get the actual content of the documents.
At the start of PDFService's process, it will collect document ids in batches. When it has a batch, it will send an async request for each id in the batch to a queue on Server 2 (getting back a 204). It will then continue collecting more batches and repeat.
Once all the batches have been collected and sent off, PDFService will start the stitching process.
In the meantime, none, some, or all of the documents may have been processed by Server 2 and returned to Server 1. Server 2 may return documents in a different order than it received them in. (Each document will take a different amount of time to compile and return.)
Server 1 must stitch them in the same order they were sent off. So, it must wait for document 1, stitch it, wait for document 2, stitch it, etc.
As of now, I have a DocumentManager class that will keep all the document ids in a Map with null values. When the completed document comes back from Server 2, the Map is updated with an actual value (an object holding the document's contents). This is obviously wrong, as then PDFService would have to use while null + sleep, which is bad.
My question is: How do I let PDFService "wait" for each document, if it needs to? Adding CompletableFuture objects to my Map seems promising, but I can't figure out how to use it or if that is even the correct approach.
(This is one of my first questions, please provide constructive feedback!)
H-m-m...
I can recommend you to look at some Enterprise Integration Frameworks like "Spring Integration", "Apache Camel", "MuleSoft" and some others. Such framework can take care about all waiting, asynchronous, parallel, aggregation etc. things and it will be much easier for you.
in general
it will send an async request for each id in the batch to a queue on Server 2
You already mentioned a queue, so using a JMS queue(s) is one of the possible solution.
Server1 sends documentId for Server2 into JMS queue
Server2 listens on queue and responds with actual document
(there are number of possibilities how server can reply on JMS message)
Server1 listens for response then stitches all of them when all received
But with EIP framework JMS is not only one possibility - as example for the batch it could be synchronous but parallel calls to Server2...
BTW: build such thing from scratch without any frameworks (EIP and/or JMS) is very painful and has no sense to do that.
I have a piece of middleware that sits between two JMS queues. From one it reads, processes some data into the database, and writes to the other.
Here is a small diagram to depict the design:
With that in mind, I have some interesting logic that I would like to integrate into the service.
Scenario 1: Say the middleware service receives a message from Queue 1, and hits the database to store portions of that message. If all goes well, it constructs a new message with some data, and writes it to Queue 2.
Scenario 2: Say that the database complains about something, when the service attempts to perform some logic after getting a message from Queue 1.In this case, instead of writing a message to Queue 2, I would re-try to perform the database functionality in incremental timeouts. i.e Try again in 5 sec., then 30 sec, then 1 minute if still down. The catch of course, is to be able to read other messages independently of this re-try. i.e Re-try to process this one request, while listening for other requests.
With that in mind, what is both the correct and most modern way to construct a future proof solution?
After reading some posts on the net, it seems that I have several options.
One, I could spin off a new thread once a new message is received, so that I can both perform the "re-try" functionality and listen to new requests.
Two, I could possibly send the message back to the Queue, with a delay. i.e If the process failed to execute in the db, write the message to the JMS queue by adding some amount of delay to it.
I am more fond of the first solution, however, I wanted to get the opinion of the community if there is a newer/better way to solve for this functionality in java 7. Is there something built into JMS to support this sort of "send message back for reprocessing at a specific time"?
JMS 2.0 specification describes the concept of delayed delivery of messages. See "What's new" section of https://java.net/projects/jms-spec/pages/JMS20FinalReleaseMany JMS providers have implemented the delayed delivery feature.
But I wonder how the delayed delivery will help your scenario. Since the database writes have issues, subsequent messages processing and attempt to write to database might end up in same situation. I guess it might be better to sort out issues with database updates and then pickup messages from queue.
Let me try explaining the situation:
There is a messaging system that we are going to incorporate which could either be a Queue or Topic (JMS terms).
1 ) Producer/Publisher : There is a service A. A produces messages and writes to a Queue/Topic
2 ) Consumer/Subscriber : There is a service B. B asynchronously reads messages from Queue/Topic. B then calls a web service and passes the message to it. The webservice takes significant amount of time to process the message. (This action need not be processed real-time.)
The Message Broker is Tibco
My intention is : Not to miss out processing any message from A. Re-process it at a later point in time in case the processing failed for the first time (perhaps as a batch).
Question:
I was thinking of writing the message to a DB before making a webservice call. If the call succeeds, I would mark the message processed. Otherwise failed. Later, in a cron job, I would process all the requests that had initially failed.
Is writing to a DB a typical way of doing this?
Since you have a fail callback, you can just requeue your Message and have your Consumer/Subscriber pick it up and try again. If it failed because of some problem in the web service and you want to wait X time before trying again then you can do either schedule for the web service to be called at a later date for that specific Message (look into ScheduledExecutorService) or do as you described and use a cron job with some database entries.
If you only want it to try again once per message, then keep an internal counter either with the Message or within a Map<Message, Integer> as a counter for each Message.
Crudely put that is the technique, although there could be out-of-the-box solutions available which you can use. Typical ESB solutions support reliable messaging. Have a look at MuleESB or Apache ActiveMQ as well.
It might be interesting to take advantage of the EMS platform your already have (example 1) instead of building a custom solution (example 2).
But it all depends on the implementation language:
Example 1 - EMS is the "keeper" : If I were to solve such problem with TIBCO BusinessWorks, I would use the "JMS transaction" feature of BW. By encompassing the EMS read and the WS call within the same "group", you ask for them to be both applied, or not at all. If the call failed for some reason, the message would be returned to EMS.
Two problems with this solution : You might not have BW, and the first failed operation would block all the rest of the batch process (that may be the desired behavior).
FYI, I understand it is possible to use such feature in "pure java", but I never tried it : http://www.javaworld.com/javaworld/jw-02-2002/jw-0315-jms.html
Example 2 - A DB is the "keeper" : If you go with your "DB" method, your queue/topic customer continuously drops insert data in a DB, and all records represent a task to be executed. This feels an awful lot like the simple "mapping engine" problem every integration middleware aims to make easier. You could solve this with anything from a custom java code and multiples threads (DB inserter, WS job handlers, etc.) to an EAI middleware (like BW) or even a BPM engine (TIBCO has many solutions for that)
Of course, there are also other vendors... EMS is a JMS standard implementation, as you know.
I would recommend using the built in EMS (& JMS) features,as "guaranteed delivery" is what it's built for ;) - no db needed at all...
You need to be aware that the first decision will be:
do you need to deliver in order? (then only 1 JMS Session and Client Ack mode should be used)
how often and in what reoccuring times do you want to retry? (To not make an infinite loop of a message that couldn't be processed by that web service).
This is independent whatever kind of client you use (TIBCO BW or e.g. Java onMessage() in a MDB).
For "in order" delivery: make shure only 1 JMS Session processes the messages and it uses Client acknolwedge mode. After you process the message sucessfully, you need to acknowledge the message with either calling the JMS API "acknowledge()" method or in TIBCO BW by executing the "commit" activity.
In case of an error you don't execute the acknowledge for the method, so the message will be put back in the Queue for redelivery (you can see how many times it was redelivered in the JMS header).
EMS's Explicit Client Acknolwedge mode also enables you to do the same if order is not important and you need a few client threads to process the message.
For controlling how often the message get's processed use:
max redelivery properties of the EMS queue (e.g. you could put the message in the dead
letter queue afer x redelivery to not hold up other messages)
redelivery delay to put a "pause" in between redelivery. This is useful in case the
Web Service needs to recover after a crash and not gets stormed by the same message again and again in high intervall through redelivery.
Hope that helps
Cheers
Seb