apache ignite cumpute service with affinity

apache ignite cumpute service with affinity - java

I would like to run a service/compute jub on ignite using a service but run the job where the data is.
from the client I will either call a compute or service proxy but need the service to run near the cache data.
I noticed you can use a service from a compute job:
compute.run(new IgniteRunnable() {
#ServiceResource(serviceName = "myCounterService");
private MyCounterService counterSvc;
If I deploy the service on every node in the cluster can I use compute with near cach to do this?
compute.affinityRun(CACHE_NAME, key, () -> {
// call my servie here...
maybe there is a way to directly call a service proxy with affinity to avoid using comupte?
p.s. the reason is the service preduce more cache data and I would like to avoid transferring large data between nodes and clients back and forth.

Unfortunately, there is no way to tell Ignite, which service instance to use, based on the argument of the method being called.
Services are good when you need to store some state in them. Otherwise compute jobs are just as effective. Moreover, compute jobs are used internally to call services' methods.
So, if you don't have any state, then just use the compute jobs. Otherwise you can try injecting needed resources into compute jobs: https://apacheignite.readme.io/docs/resource-injection

You can use Affinity key based deployment

Related

how to maintain author name per request in spring and vertx application?

I have few microservices written in mainly java( with spring and hibernate). I use vertx for rest apis and for non blocking events.
In every API call there is a jwt token which contains very basic user info like username. I can extract the username from jwt token from vertx RoutingContext.
Now I want to audit the username in database whenever there is change in hibernate entities or when a new entity is getting saved. But after the event is passed to worker verticles that handles the DB operations, I do not have username unless I pass it in eventbus with every request.
I want to know if we can have any context in which I can persist the username and access within worker verticles during the lifecycle of a request.

You can use a cluster-wide AsyncMap to store your data between requests and nodes.
Something like:
vertx.sharedData().<String, String>getAsyncMap("mymap", res -> {
if (res.succeeded()) {
res.result().put("foo", "bar", resPut -> {...});
}
});
More info in ref-doc

Vert.x is built with async and non blocking paradigm in mind, the separation of contexts is there so no thread sync will be necessary and avoid this overhead
as such, sharing state between workers is not suggested,
I suggest to lean into the message driven domain and have the functions (workers) as pure as possible, i.e pass to each worker all the data he needs to do it's job, send the username or/and any other data the worker will need
you don't need to worry much about the event performance, it's very efficient and built for high load
another option will be to use a shared data store, that all workers know and can share the state between them in efficient async way, vert.x has implemented internal sharedData and you can use that, although it very basic
so if you need a data store with more capabilities, take a look at Hazelcast in memory data grid that can give you a lot more capabilities and still act as a distribute fast and async data store for all nodes, (already plays nicely with Vert.x as distribute manger)

Could I use igniteQueue in another cache.invoke() function?

Could I use igniteQueue in another cache.invoke() function?
In Ignite Service A's excute function:
cacheA.invoke(record){ // do process to record
igniteQueue.put(processed_record);
}
In Ignite Service B's excute function:
saved_processed_record = igniteQueue.take();
It runs smoothly when TPS is low, but when i running with high TPS, some times I get "Possible starvation in striped pool" in log,
See my previous post:
Ignite service hangs when call cache remove in another cache's invoke processor, " Possible starvation in striped pool"?
It seems I use igniteQueue in cache.invoke is also not correct vs. use ignite cache in cache.invoke?
So if i could not use ignite queue in a cache.invoke(), is there a better way to do so? I have try to use another message queue(kafka or redis) instead ignite queue in the cache, but we know Ignite say it is also a message queue, using kafka in ignite invoke seems very strange, how could i use pure ignite to achive this?

You should not issue any blocking operations from "invoke(..)" method, as it executes within a lock on the key. Instead, why not create another thread pool and have it be responsible for adding and taking objects from the IgniteQueue. Then you can simply submit a task to that thread pool from the "invoke(..)" method and inside that task enqueue an object.

how to process multiple API calls from the same client one by one in a scalable, highly concurrent and fault tolerant system

We have web service APIs to support clients running on ten millions devices. Normally clients call server once a day. That is about 116 clients seen per second. For each client (each with unique ID), it may make several APIs calls concurrently. However, Server can only process those API calls one by one from the same client. Because, those API calls will update the same document of that client in the backend Mongodb database. For example: need to update last seen time and other embedded documents in the document of this client.
One solution I have is to put synchronized block on an "intern" object representing this client's unique ID. That will allow only one request from the same client obtains the lock and be processed at the same time. In addition, requests from other clients can be processed at the same time too. But, this solution requires to turn on load balancer's "stickiness". That means load balancer will route all requests from the same ip address to a specific server within a preset time interval (e.g. 15 minute). I am not sure if this has any impact to the robustness in the whole system design. One thing I can think of is that some clients may make more requests and make the load not balanced (create hotspots).
Solution #1:
Interner<Key> myIdInterner = Interners.newWeakInterner();
public ResponseType1 processApi1(String clientUniqueId, RequestType1 request) {
synchronized(myIdInterner.intern(new Key(clientUniqueId))) {
// code to process request
}
}
public ResponseType2 processApi2(String clientUniqueId, RequestType2 request) {
synchronized(myIdInterner.intern(new Key(clientUniqueId))) {
// code to process request
}
}
You can see my other question for this solution in detail: Should I use Java String Pool for synchronization based on unique customer id?
The second solution I am thinking is to somehow lock the document (Mongodb) of that client (I have not found a good example to do that yet). Then, I don't need to touch load balancer setting. But, I have concerns on this approach as I think the performance (round trips to Mongodb server and busy waiting?) will be much worse compared to solution #1.
Solution #2:
public ResponseType1 processApi1(String clientUniqueId, RequestType1 request) {
try {
obtainDocumentLock(new Key(clientUniqueId));
// code to process request
} finally {
releaseDocumentLock(new Key(clientUniqueId));
}
}
public ResponseType2 processApi2(String clientUniqueId, RequestType2 request) {
try {
obtainDocumentLock(new Key(clientUniqueId));
// code to process request
} finally {
releaseDocumentLock(new Key(clientUniqueId));
}
}
I believe this is very common issue in a scalable and high concurrent system. How do you solve this issue? Is there any other option? What I want to achieve is to be able to process one request at a time for those requests from the same client. Please be noted that just controlling the read/write access to database does not work. The solution need to control the exclusive processing of the whole request.
For example, there are two requests: request #1 and request #2. Request #1 read the document of the client, update one field of a sub-document #5, and save the whole document back. Request #2 read the same document, update one field of sub-document #8, and save the whole document back. At this moment, we will get an OptimisticLockingFailureException because we use #Version annotation from spring-data-mongodb to detect version conflict. So, it is imperative to process only one request from the same client at any time.
P.S. Any suggestion in selecting solution #1 (lock on single process/instance with load balancer stickiness turned on) or solution #2 (distributed lock) for a scalable, and high concurrent system design. The goal is to support tens of millions clients with concurrently hundreds of clients access the system per second.

In your solution, you are doing a lock split based on customer id so two customers can process the service same time. The only problem is the sticky session. One solution can be to use distributed lock so you can dispatch any request to any server and the server gets the lock process. Only one consideration is it involves remote calls. We are using hazelcast/Ignite and it is working very well for average number of nodes.
Hazelcast

Why not just create a processing queue in Mongodb whereby you submit client request documents, and then another server process that consumes them, produces a resulting document, that the client waits for... synchronize the data with clientId, and avoid that activity in the API submission step. The 2nd part of the client submission activity (when finished) just polls Mongodb for consumed records looking for their API / ClientID and some job tag. That way, you can scale out the API submission, and separately the API consumption activities on separate servers etc.

One obvious approach is simply to implement the full optimistic locking algorithm on your end.
That is, you get sometimes get OptimisticLockingFailureException when there are concurrent modifications, but that's fine: just re-read the document and start the modification that failed over again. You'll get the same effect as if you had used locking. Essentially you are leveraging the concurrency control already built-in to MongoDB. This also has the advantage of getting several transactions go through from the same client if they don't conflict (e.g., one is a read, or they write to different documents), potentially increasing the concurrency of your system. On other hand, you have to implement the re-try logic.
If you do want to lock on a per-client basis (or per-document or whatever else) and your server is a single process (which is implied by your suggested approach) you just need a lock manager that works on arbitrary String keys, which has several reasonable solutions including the Interner one your mentioned.

Does Jedis support async operations

I am using Jedis (java client) to commmunicate with Redis server. I have 3 Redis instances running on three different nodes. I want to "get" (read) some records from 3 Redis instances. I want to issue these "gets" (reads) in parallel, and then do some processing on the received data and form a final output.
What is the best way to do this in java?
One of the way is to create 3 threads and isssue "get" (read) in each of them (synchronously). Wait for all 3 commands to complete and then combine the result.
Does Jedis have a mechanism for issuing 3 "gets" (any command for that matter) asynchronously, with a callback feature?
I have 3 different Redis instances. So do you suggest to use "ShardedJedisPipeline" (jedis/tests/ShardedJedisPipelineTest.java) for interacting with these Redis instances in parallel?
Normal Jedis Pipeline (jedis/tests/PipeliningTest.java), just sends multiple commands to single Redis instance so they are executed one after other on Redis server (and all responses available at the end).
So I assume I have to use "ShardedJedisPipeline". But there are two limitations to this:
1. I want to execute Lua script i.e. "eval" on 3 Redis instances in parallel.
2. I dont want sharding (some hash algorithm used by Jedis) to distribute data or on its own (using its algorithm) read data from instances. We have a different strategy for distributing data. So I want to be able to specify, a record should be stored in which redis instance and accordingly from where it should be read. keyTags seems to provide this mechanism but not sure how to use this with "eval".

You can use pipeline as mentioned.
AsyncJedis is a work is progress and will be released with the next version of Jedis. It will be based on netty and will be compatible with vert.x

Until then you can roll it yourself with an ExecutorService with three Jedis instances, then await the futures that are returned.

As far as Feb 2015, Jedis apparently does not support Async operation over a single redis instance as you need: https://github.com/xetorthio/jedis/issues/241
What I would do in your case is go ahead with 3 threads and proceed with Futures and Executor Service as #Xorlev suggested above.

long processing jobs in a java web app

What is the best way to perform long tasks (triggered by a user and for that user only) in a java web app? I tried using ejb #Asynchronous and jax-ws asynchronous (polling) calls but the Future<?> they are returning is not serializable and could not be stored in HttpSession (to retrieve the result later when it's done). Is there a simple way to use concurrent Future<?> in a java web environment or do I have to go with a full-blown jobs management framework?

Best solution so far was to use an application-scoped Map<SessionId, List<Future<?>>>. This works in cluster with sticky sessions and does not need to use JMS queues and storage of result in database.

The best is to use JMS . Implement a messaging solution which is asynchronous which sends a message to a queue/topic where a MDB listens to that queue / topic to be triggered upon the message arrival to perform the long task in an offline manner
http://www.javablogging.com/simple-guide-to-java-message-service-jms-using-activemq/
http://docs.oracle.com/javaee/1.3/jms/tutorial/

If your process is supposed to generate a result and you are expecting the process to take a long time, probably the best way is two have 2 separate calls:
First one to trigger the process and which return a unique process identifier
Second one to retrieve the result using the process identifier
So your overall process flow will be:
Client call back end service.
Back end service starts async process with unique id and return the unique id to client right away.
Async process persist the result in session or other more persistent mechanism (db, file, etc.)
Client side poll server with unique id
Retrieval method return result when exist, otherwise return not done message

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.