Could I use igniteQueue in another cache.invoke() function? - java

Could I use igniteQueue in another cache.invoke() function?
In Ignite Service A's excute function:
cacheA.invoke(record){ // do process to record
igniteQueue.put(processed_record);
}
In Ignite Service B's excute function:
saved_processed_record = igniteQueue.take();
It runs smoothly when TPS is low, but when i running with high TPS, some times I get "Possible starvation in striped pool" in log,
See my previous post:
Ignite service hangs when call cache remove in another cache's invoke processor, " Possible starvation in striped pool"?
It seems I use igniteQueue in cache.invoke is also not correct vs. use ignite cache in cache.invoke?
So if i could not use ignite queue in a cache.invoke(), is there a better way to do so? I have try to use another message queue(kafka or redis) instead ignite queue in the cache, but we know Ignite say it is also a message queue, using kafka in ignite invoke seems very strange, how could i use pure ignite to achive this?

You should not issue any blocking operations from "invoke(..)" method, as it executes within a lock on the key. Instead, why not create another thread pool and have it be responsible for adding and taking objects from the IgniteQueue. Then you can simply submit a task to that thread pool from the "invoke(..)" method and inside that task enqueue an object.

Related

Redis Redisson - Strategies for workers

I am new to redis, and redisson but kind of up to date with what is available now.
Mostly from here: https://github.com/redisson/redisson/wiki/9.-distributed-services#91-remote-service
The case here involves a worker, on only one server out of say many. The worker gets to images which can be downloaded later on. They can be pushed to an executor to download later, however, that is not persistable and so we will loose out.
Redis offers executorservice. But I was wondering, does all redis nodes share or ship in to peform the work by default? Is there a way to control that only one gets to do the work? The stuff in the runnable / callable that is being accessed, I am guessing there has to be restrictions on what could be used since it is a closure with access to environment? No access?
Redis also offers something called distributed remote services. How are they different from an executorservice in this regard?
Another option is to push these to reddis list / queue / dequeu, and work of the "messages" albeit the executor service I think would allow me to have all that logic in the same place.
What is the best approach?
What are the rules for objects inside the closure supplied in a runnable / callable ? Fully serialiazeble everything ?
How do I manage if a worker is working, and suddenly dies (nuclear). Can I ensure that someone else gets to work?

Is there any way we can pause kafka stream for certain period and resume later?

We have one requirement where we are using Kafka Streams to read from Kafka topic and then send the data over network through a pool of sessions. However, sometimes, network calls are bit slow and we need to frequently pause the stream, ensure we are not overloading network. Currently, we capture data into a stream and load it to a executor service and then send it over network through session pool.
If data in executor service is too high, we need to pause the stream for some time and then resume it once backlog on executor service is cleared up. For achiveing this pause mechanism, We are currently closing the stream and starting again once backlog is cleared up.
Is there any way we can pause the kafka stream?
If I understand you correctly, there is nothing special you need to do. You are talking about "back pressure" and Kafka Streams can handle it out of the box.
What can be done is putting this data into a queue with some max size and use this queue to load in executor service. Whenever the queue reaches some threshold, there are two methods:
If your call to put data in queue is blocking with no time-out, there is nothing more you need to do. Just wait until the system is back online, your call
returns, and processing will resume.
If your call to put data in queue is blocking with time-out,just issue the lookup to check the size of the queue. Repeat this until the system is back online and your call succeeds.
The only caveat is that as long as your Streams application blocks, the internally used Kafka consumer client will not send any heartbeats to Kafka and might time out. Thus, you need to set the time-out configuration parameter higher than the expected maximum downtime of your external system.
Another approach is to use a Processor API available in Kafka-streams, though, it is not usually recommended pattern.
Let me know if it helps!!

Can we use ForkJoinPool implementation with unknown batch size on which it should "join"

This is my first encounter with ForkJoinPool and I am trying to understand if I can convert my existing ExecutorService implementation into ForkJoinPool implementation?
Scenario:
My application needs to perform two operations for multiple threads:
First operation does some data copy which can occur in parallel threads without breaking functionality. [looks eligible to be FORK-like sub-task]
Second operation performs server restart, which can be requested only after last thread has finished copying the data. Currently underlying server app handles the logic of rejecting duplicate restart requests, if previously requested restart is still happening. [I am trying to decide if this can be implemented as a JOIN sub-task?]
In current implementation, we are performing both steps independently for each thread, but frequent restart requests has certain risk associated with it.
Questions:
I am unable to understand if ForkJoinPool implementation can be done, if we dont know how many active/running threads we will need to combine.
On what condition should we "await" to join, if we dont know how many such requests will come?
Should I be looking at CountDownLatch implementation instead?

apache ignite cumpute service with affinity

I would like to run a service/compute jub on ignite using a service but run the job where the data is.
from the client I will either call a compute or service proxy but need the service to run near the cache data.
I noticed you can use a service from a compute job:
compute.run(new IgniteRunnable() {
#ServiceResource(serviceName = "myCounterService");
private MyCounterService counterSvc;
If I deploy the service on every node in the cluster can I use compute with near cach to do this?
compute.affinityRun(CACHE_NAME, key, () -> {
// call my servie here...
maybe there is a way to directly call a service proxy with affinity to avoid using comupte?
p.s. the reason is the service preduce more cache data and I would like to avoid transferring large data between nodes and clients back and forth.
Unfortunately, there is no way to tell Ignite, which service instance to use, based on the argument of the method being called.
Services are good when you need to store some state in them. Otherwise compute jobs are just as effective. Moreover, compute jobs are used internally to call services' methods.
So, if you don't have any state, then just use the compute jobs. Otherwise you can try injecting needed resources into compute jobs: https://apacheignite.readme.io/docs/resource-injection
You can use Affinity key based deployment

Multi Threading with datastax java driver 2.0

My Data Model is based on time series(inserts feeds from various sources in cassandra CFs.) Can anyone suggest how to do inserts in Multi Threading.? Is executing query with executeAsync method similar to multi threading ? Is there any property of cassandra.yaml which I need to set to achieve Multi Threading ? Or any other prerequisites.
The driver is safe for multi-threaded use. What you will typically do is build your Cluster and get a Session instance during application startup, and then share the Session among all threads.
How you handle multi-threading is specific to your code. I don't know SQS either, but I imagine you'd either have multiple consumers that poll from the queue and process the messages themselves, or maybe dispatch the messages to a pool of workers.
Regarding executeAsync, the returned ResultSetFuture implements Guava's ListenableFuture, so you can register a success callback with addListener. But you'll have to provide an Executor to run that callback on (I don't recommend MoreExecutors#sameThreadExecutor as mentioned in the Javadoc, because your callback would end up running on one of the driver's I/O threads).
As mentioned by Carlo, a simple approach is to use the synchronous execute, and have your worker block until it gets a response from Cassandra, and then acknowledge the message.
executeAsync() creates a separate thread for the execution of the statement and immediately returns the control to caller -- a Future<ResultSet> will have your result. When working with this approach you won't know if any exception occurred until you're inside the Future.
In Cassandra you don't have to set anything. Just keep under control the thread-number within your application and initialize properly the Java Driver providing a PoolingOptions object that match your needs.
HTH, Carlo
If you are executing the query in multithreading environment, then make sure you wait for the executeAsync(statement) to complete,
session.executeAsync(statement) will return immediately, it does not guarantee whether the query is valid or submitted successfully. So if you're using threadpool then always use
ResultSetFuture future = session.executeAsync(statement);
future.getUninterruptibly();
This will wait for the query to be submitted and will not consume memory.

Categories