Large number of single threaded task queues

Large number of single threaded task queues - java

At our company we have a server which is distributed into few instances. Server handles users requests. Requests from different users can be processed in parallel. Requests from same users should be executed strongly sequentionally. But they can arrive to different instances due to balancing. Currently we use Redis-based distributed locks but this is error-prone and requires more work around concurrency than business logic.
What I want is something like this (more like a concept):
Distinct queue for each user
Queue is named after user id
Each requests identified by request id
Imagine two requests from the same user arriving at two different instances concurrently:
Each instance put their request id into this user queue.
Additionaly, they both store their request ids locally.
Then some broker takes request id from the top of "some_user_queue" and moves it into "some_user_queue_processing"
Both instances listen for "some_user_queue_processing". They peek into it and see if this is request id they stored locally. If yes, then do processing. If not, then ignore and wait.
When work is done server deletes this id from "some_user_queue_processing".
Then step 3 again.
And all of this happens concurrently for a lot (thousands of them) of different users (and their queues).
Now, I know this sounds a lot like actors, but:
We need solution requiring as small changes as possible to make fast transition from locks. Akka will force us to rewrite almost everything from scratch.
We need production ready solution. Quasar sounds good, but is not production ready yet (more correctly, their Galaxy cluster).
Tops at my work are very conservative, they simply don't want another dependency which we'll need to support. But we already use Redis (for distributed locks), so I thought maybe it could help with this too.
Thanks

The best solution that matches the description of your problem is Redis Cluster.
Basically, the cluster solves your concurrency problem, in the following way:
Two (or more) requests from the same user, will always go to the same instance, assuming that you use the user-id as a key and the request as a value. The value must be actually a list of requests. When you receive one, you will append it to that list. In other words, that is your queue of requests (a single one for every user).
That matching is being possible by the design of the cluster implementation. It is based on a range of hash-slots spread over all the instances.
When a set command is executed, the cluster performs a hashing operation, which results in a value (the hash-slot that we are going to write on), which is located on a specific instance. The cluster finds the instance that contains the right range, and then performs the writing procedure.
Also, when a get is performed, the cluster does the same procedure: it finds the instance that contains the key, and then it gets the value.
The transition from locks is very easy to perform because you only need to have the instances ready (with the cluster-enabled directive set on "yes") and then to run the cluster-create command from redis-trib.rb script.
I've worked last summer with the cluster in a production environment and it behaved very well.

Related

how to process multiple API calls from the same client one by one in a scalable, highly concurrent and fault tolerant system

We have web service APIs to support clients running on ten millions devices. Normally clients call server once a day. That is about 116 clients seen per second. For each client (each with unique ID), it may make several APIs calls concurrently. However, Server can only process those API calls one by one from the same client. Because, those API calls will update the same document of that client in the backend Mongodb database. For example: need to update last seen time and other embedded documents in the document of this client.
One solution I have is to put synchronized block on an "intern" object representing this client's unique ID. That will allow only one request from the same client obtains the lock and be processed at the same time. In addition, requests from other clients can be processed at the same time too. But, this solution requires to turn on load balancer's "stickiness". That means load balancer will route all requests from the same ip address to a specific server within a preset time interval (e.g. 15 minute). I am not sure if this has any impact to the robustness in the whole system design. One thing I can think of is that some clients may make more requests and make the load not balanced (create hotspots).
Solution #1:
Interner<Key> myIdInterner = Interners.newWeakInterner();
public ResponseType1 processApi1(String clientUniqueId, RequestType1 request) {
synchronized(myIdInterner.intern(new Key(clientUniqueId))) {
// code to process request
}
}
public ResponseType2 processApi2(String clientUniqueId, RequestType2 request) {
synchronized(myIdInterner.intern(new Key(clientUniqueId))) {
// code to process request
}
}
You can see my other question for this solution in detail: Should I use Java String Pool for synchronization based on unique customer id?
The second solution I am thinking is to somehow lock the document (Mongodb) of that client (I have not found a good example to do that yet). Then, I don't need to touch load balancer setting. But, I have concerns on this approach as I think the performance (round trips to Mongodb server and busy waiting?) will be much worse compared to solution #1.
Solution #2:
public ResponseType1 processApi1(String clientUniqueId, RequestType1 request) {
try {
obtainDocumentLock(new Key(clientUniqueId));
// code to process request
} finally {
releaseDocumentLock(new Key(clientUniqueId));
}
}
public ResponseType2 processApi2(String clientUniqueId, RequestType2 request) {
try {
obtainDocumentLock(new Key(clientUniqueId));
// code to process request
} finally {
releaseDocumentLock(new Key(clientUniqueId));
}
}
I believe this is very common issue in a scalable and high concurrent system. How do you solve this issue? Is there any other option? What I want to achieve is to be able to process one request at a time for those requests from the same client. Please be noted that just controlling the read/write access to database does not work. The solution need to control the exclusive processing of the whole request.
For example, there are two requests: request #1 and request #2. Request #1 read the document of the client, update one field of a sub-document #5, and save the whole document back. Request #2 read the same document, update one field of sub-document #8, and save the whole document back. At this moment, we will get an OptimisticLockingFailureException because we use #Version annotation from spring-data-mongodb to detect version conflict. So, it is imperative to process only one request from the same client at any time.
P.S. Any suggestion in selecting solution #1 (lock on single process/instance with load balancer stickiness turned on) or solution #2 (distributed lock) for a scalable, and high concurrent system design. The goal is to support tens of millions clients with concurrently hundreds of clients access the system per second.

In your solution, you are doing a lock split based on customer id so two customers can process the service same time. The only problem is the sticky session. One solution can be to use distributed lock so you can dispatch any request to any server and the server gets the lock process. Only one consideration is it involves remote calls. We are using hazelcast/Ignite and it is working very well for average number of nodes.
Hazelcast

Why not just create a processing queue in Mongodb whereby you submit client request documents, and then another server process that consumes them, produces a resulting document, that the client waits for... synchronize the data with clientId, and avoid that activity in the API submission step. The 2nd part of the client submission activity (when finished) just polls Mongodb for consumed records looking for their API / ClientID and some job tag. That way, you can scale out the API submission, and separately the API consumption activities on separate servers etc.

One obvious approach is simply to implement the full optimistic locking algorithm on your end.
That is, you get sometimes get OptimisticLockingFailureException when there are concurrent modifications, but that's fine: just re-read the document and start the modification that failed over again. You'll get the same effect as if you had used locking. Essentially you are leveraging the concurrency control already built-in to MongoDB. This also has the advantage of getting several transactions go through from the same client if they don't conflict (e.g., one is a read, or they write to different documents), potentially increasing the concurrency of your system. On other hand, you have to implement the re-try logic.
If you do want to lock on a per-client basis (or per-document or whatever else) and your server is a single process (which is implied by your suggested approach) you just need a lock manager that works on arbitrary String keys, which has several reasonable solutions including the Interner one your mentioned.

How to calculate aggregate values in distributed architecture

I have a cluster of Web applications (Java + Tomcat), and the apps generate events. The volume is not that high, but somewhere under 10 million of events per day (unevenly distributed with peaks and valleys).
We need to display calculated aggregates of events on the user interface. Currently, this is done by running DB queries against a large table with many indexes on each page display.
Is there a good architectural approach to keeping a flow of events and also calculating (on the fly) and keeping aggregate numbers, like Average, Mean, Min, Max, etc?
Real time is not important, but near-real time is a must. For instance, a latency of under 1 minute is acceptable.

You can go with a push model or a pull model. (Or proactive/reactive if you like those terms better.) In both cases you've got a centralized records-keeper that must aggregate the data you want. In the push model your decentralized services/servers/applications will periodically push updates to your records keeper. In the pull model your records keeper will periodically query your decentralized services and request updates.
In a push scenario, each independent service/server/application keeps a log of their own event counter. Once the event counter ticks over a certain threshold it will notify the records keeper of the new status. For example, they could push an update every 100 or 1000 or delta events. Thus, (assuming there are no undetectable failures) the records keeper always knows how many events have occurred in the system plus or minus your delta. This gives great performance, since whenever someone wants to access the event records all of the data is already aggregated. One downside is that there's a low but persistent overhead imposed on the system. Another is that you never know if a service has failed or whether it just hasn't had a lot of events recently (plus/minus delta).
In the pull scenario your decentralized services still keep logs, but they don't do anything until the records keeper requests an update. When you want to know the state of the system the records keeper must query everyone in the system, get their responses, and assemble the results. This is probably the easiest thing to implement, and one positive aspect is that there is zero system overhead until you actually request an update. The downside is that update requests can cause a big drag on the system when they occur (since everyone drops everything and you generate traffic throughout the entire system). For this same reason it'll take a while to generate updates when the request comes in.
Now, both of these approaches are independent of implementation methodology. Either one of these approaches might be implemented with a completely flat topology, where every service communicates directly with your records keeper. Alternately you might form a hierarchy of services, so that each parent in the hierarchy is responsible for aggregating the data of their children. What you want to do in this respect really depends on exactly how fast an efficient the system needs to be.

Java - How can I ensure I run a single instance of a process in a clustered environment

I have a jvm process that wakes a thread every X minutes.
If a condition is true -> it starts a job (JobA).
Another jvm process does almost the same but if the condition is true -
it throws a message to a message broker which triggers the job in another server (JobB).
Now, to avoid SPOF problem I want to add another instance of this machine in my cloud.
But than I want ensure I run a single instance of a JobA each time.
What are my options?

There are a number of patterns to solve this common problem. You need to choose based on your exact situation and depending on which factor has more weight in your case (performance, correctness, fail-tolerance, misfires allowed or not, etc). The two solution-groups are:
The "Quartz" way: you can use a JDBCStore from the Quartz library which (partially) was designed for this very reason. It allows multiple nodes to communicate, and share state and workload between each other. This solution gives you a probably perfect solution at the cost of some extra coding and setting up a shared DB (9 tables I think) between the nodes.
Alternatively your nodes can take care of the distribution itself: locking on a resource (single record in a DB for example) can be enough to decide who is in charge for that iteration of the execution. Sharing previous states however will require a bit more work.

Publishing to KDB from multiple threads

We have an application with multiple threads which reuses one KDB connection.
From performance perspective, will it be good to open multiple connection to multithreaded KDB instance to speed up the process? Just also interesting is there any potential downside effect if we publish from multiple threads to a single connection: we have java app and use exxeleron java library.

Aside from the fact that a single socket connection to KDB isn't very resource hungry by itself, in the end I think you'll find that disk seeks and memory allocation are by far the largest bottlenecks, not how many connections you have to a database. That said, since you ask...
Let's go on simple assumptions:
The KDB database is a historical database. Multithread options on that side are negative port number and -s - which can't be set simultaneously
You have a single process, let's call it A, that accesses it
With a negative port number, you get multi-threaded input queue. So if A has the ability to do multiple queries they can be dispatched simultaneously and KDB+ won't block on each call. However A would somehow need to be able to identify the incoming stream of results as the responses to particular queries. You can query it like (<queryId>;<actualQuery>) and parse the the first element for identification I suppose. However in this use case it sounds like you should have multiple A's.
With -s you get multi-threaded queries so you q queries have to written as such (sometimes you get it for free though, like querying across partitions). You'll block on every call, so no real advantage in having multiple A's.

Design for scalable periodic queue message batching

We currently have a distributed setup where we are publishing events to SQS and we have an application which has multiple hosts that drains messages from the queue and does some transformation over it and transmits to interested parties. I have a use case where the receiving end point has scalability concerns with the message volume and hence we would like to batch these messages periodically (say every 15 mins) in the application before sending it.
The incoming message rate is around 200 messages per second and each message is no more than 10 KB. This system need not be real time, but would definitely be a good to have and also the order is not important (its okay if a batch containing older messages gets sent first).
One approach that I can think of is maintaining an embedded database within the application (each host) that batches the events and another thread that runs periodically and clears the data.
Another approach could be to create timestamped buckets in a a distributed key-value store (s3, dynamo etc.) where we write the message to the correct bucket based the messages time stamp and we periodically clear the buckets.
We can run into several issues here, since the messages would be out of order a bucket might have already been cleared (can be solved by having a default bucket though), would need to accurately decide when to clear a bucket etc.
The way I see it, at least two components would be required one which does the batching into a temporary storage and another that clears it.
Any feedback on the above approaches would help, also it looks like a common problem are they any existing solutions that I can leverage ?
Thanks

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.