Redis Redisson - Strategies for workers - java

I am new to redis, and redisson but kind of up to date with what is available now.
Mostly from here: https://github.com/redisson/redisson/wiki/9.-distributed-services#91-remote-service
The case here involves a worker, on only one server out of say many. The worker gets to images which can be downloaded later on. They can be pushed to an executor to download later, however, that is not persistable and so we will loose out.
Redis offers executorservice. But I was wondering, does all redis nodes share or ship in to peform the work by default? Is there a way to control that only one gets to do the work? The stuff in the runnable / callable that is being accessed, I am guessing there has to be restrictions on what could be used since it is a closure with access to environment? No access?
Redis also offers something called distributed remote services. How are they different from an executorservice in this regard?
Another option is to push these to reddis list / queue / dequeu, and work of the "messages" albeit the executor service I think would allow me to have all that logic in the same place.
What is the best approach?
What are the rules for objects inside the closure supplied in a runnable / callable ? Fully serialiazeble everything ?
How do I manage if a worker is working, and suddenly dies (nuclear). Can I ensure that someone else gets to work?

Related

How does Apache Curator DistributedQueue's lockPath work?

I have a bunch of independent pieces of work that I need processes to perform. These pieces of work can be performed in any order, and they last long enough that processes sometimes fail when work is being performed.
I need to coordinate the assignment of these pieces of work, and Curator's DistributedQueue seems like it is almost what I want. I don't need the ordering it provides, though, so I am curious what level of overhead I am paying for that assuming I decline to have a single consumer (ie each process just consumes from the queue).
My main concern is how the lockPath() function on the queue builder actually works. I need the functionality it provides, because it is possible for processes to fail and I need to not be dropping the jobs they were supposed to be doing. But what I don't want is for only one process to be able to do any work at a time. If I use lockPath(), will the queue block for other processes while a process is consuming a message?
Also, if the queue seems like an unreasonable approach, is there another tool available to achieve what I want, or would I have to roll my own? I want to stay within the Curator / ZK environment but am open to alternatives within that.
(Note: I'm the main author of Apache Curator)
The documentation needs to be improved. The lock is used to make the queue entry retry-able. i.e. the entry in the queue is not removed until the consumer finishes. The lock assures that only 1 process is acting on the entry. If you don't care about dropping queue entries on failure you don't need to use the lock. With or without the lock, though, each consumer that you run processes queue entries. So, if you want to have concurrent processing of the queue you'd run multiple consumers (in the same JVM or in separate JVMs - it doesn't matter).
Here's a workflow engine I wrote that uses the Curator queue to do distributed work. Feel free to use it as it is open source: http://nirmataoss.github.io/workflow/

Processing methods/threads in correct order

I have an application that is quite complex (it's a command and control center spring + angular based application that is meant to be used by police and other emergency center controllers).
Main component of application (lets call it backbone [spring web app]) is communication with different applications/hardware. Most of that communication is done by using RabbitMQ messages (lets call them telegrams or in short TMs).
When one of that TMs is received by backbone new thread is created and in it some method/methods is/are executed.
The problem is that it can happen that backbone receives two or more TMs almost at the same time, and since they are executed in different threads it can happen that they are not finessed in same order as they arrived and hence wrong information is presented to the user.
Usually, problems like this I handle with Redis. I have a redis lock that basically looks like this
distributedRedisLocker.lock(() -> {
executeSomeMethod();
}, howLongIsLockKept, howLongDoWeWaitForItToFinnish);
But in this case I would like to avoid using redis, is there any other java/spring based solution to this?
I do not need it to be same as redis lock that I have, only what I want is that TMs are proccessed in order they arrive, and if one of them fails somewhere in method execution that it does not block the next one forever.
Reposting as an answer.
One way to solve your problem is to avoid concurrency. You could use Executors.newSingleThreadExecutor() which only uses one thread:
ExecutorService executor = Executors.newSingleThreadExecutor();
and then
executor.execute(...);
or
executor.submit(...);
This would help you avoid races: if some TM A is added to execution queue defined by this executor before some TM B, then A will be executed before B as a whole.
Also, no explicit locks are involved (apart from implicit ones that can be contained inside the executor implementation, but they are encapsulated and will not stay handing forever in case of an error).
There is a subtle moment: if two TMs arrive at the same time, it's impossible to predict which will be added earlier and which later.

Using Java thread pool, how to process some messages serially and others in parallel depending on message characteristic?

This is more of a Java concurrency design question. I’m working on an application that need to process many messages for many different clients. If two messages have different client names, then they can be processed in parallel. However, if they have the same client name, then they need to be processed in order serially.
What’s the best way to implement this?
My current implementation is pretty simple: I wrote a wrapper class called OrderedExecutorPool. It has a list of single-threaded executors. In its submit method, it does the following to figure out which executor to submit the task to:
int executorNum = Math.abs(clientName.hashCode()) % numExecutors;
executorList.get(executorNum).submit(task);
This ensures that all messages with same clients go to the same executor while still supporting processing messages for different clients in parallel.
There are a couple of problems with this design:
1.) If most client names have same hash code, then only a few executors are doing work
2.) If one client has MANY messages, only one executor may not keep up
Is there an elegant solution to this problem that can fix the shortcomings above?
Edit
clientName is just a String. I'm just invoking the String.hashCode() method on it.
There is no jdk builtin solution that i know of. i've implemented a custom executor solution to this at my current job using this basic logic.
keep an internal map of clientname to work queue (each client has their own queue)
when work comes in for a client, add it to their queue
if this is the first job on the queue, create a Runnable for this clientname/queue and push it into the "real" executor (standard jdk thread pool)
Runnable impl just consumes tasks from a single client queue until empty and then exits
this simple implementation is the "greedy" approach (a client will keep working until its queue is empty). if you have more clients than underlying threads, you may want a more "fair" approach, where a client executes some number of tasks and they re-queues itself in the underlying executor (thus allowing other clients to get some work done).

Two threads reading from the same table:how do i make both thread not to read the same set of data from the TASKS table

I have a tasks thread running in two separate instances of tomcat.
The Task threads concurrently reads (using select) TASKS table on certain where condition and then does some processing.
Issue is ,sometimes both the threads pick the same task , because of which the task is executed twice.
My question is how do i make both thread not to read the same set of data from the TASKS table
It is just because your code(which is accessing data base)DAO function is not synchronized.Make it synchronized,i think your problem will be solved.
If the TASKS table you mention is a database table then I would use Transaction isolation.
As a suggestion, within a trasaction, set an attribute of the TASK table to some unique identifiable value if not set. Commit the tracaction. If all is OK then the task has be selected by the thread.
I haven't come across this usecase so treat my suggestion with catuion.
I think you need to see some information how does work with any enterprise job scheduler, for example with Quartz
For your use case there is a better tool for the job - and that's messaging. You are persisting items that need to be worked on, and then attempting to synchronise access between workers. There are a number of issues that you would need to resolve in making this work - in general updating a table and selecting from it should not be mixed (it locks), so storing state there doesn't work; neither would synchronization in your Java code, as that wouldn't survive a server restart.
Using the JMS API with a message broker like ActiveMQ, you would publish a message to a queue. This message would contain the details of the task to be executed. The message broker would persist this somewhere (either in its own message store, or a database). Worker threads would then subscribe to the queue on the message broker, and each message would only be handed off to one of them. This is quite a powerful model, as you can have hundreds of message consumers all acting on tasks so it scales nicely. You can also make this as resilient as it needs to be, so tasks can survive both Tomcat and broker restarts.
Whether the database can provide graceful management of this will depend largely on whether it is using strict two-phase locking (S2PL) or multi-version concurrency control (MVCC) techniques to manage concurrency. Under MVCC reads don't block writes, and vice versa, so it is very possible to manage this with relatively simple logic. Under S2PL you would spend too much time blocking for the database to be a good mechanism for managing this, so you would probably want to look at external mechanisms. Of course, an external mechanism can work regardless of the database, it's just not really necessary with MVCC.
Databases using MVCC are PostgreSQL, Oracle, MS SQL Server (in certain configurations), InnoDB (except at the SERIALIZABLE isolation level), and probably many others. (These are the ones I know of off-hand.)
I didn't pick up any clues in the question as to which database product you are using, but if it is PostgreSQL you might want to consider using advisory locks. http://www.postgresql.org/docs/current/interactive/explicit-locking.html#ADVISORY-LOCKS I suspect many of the other products have some similar mechanism.
I think you need have some variable (column) where you keep last modified date of rows. Your threads can read same set of data with same modified date limitation.
Edit:
I did not see "not to read"
In this case you need have another table TaskExecutor (taskId , executorId) , and when some thread runs task you put data to TaskExecutor; and when you start another thread it just checks that task is already executing or not (Select ... from RanTask where taskId = ...).
Нou also need to take care of isolation level for transaсtions.

Patterns/Principles for thread-safe queues and "master/worker" program in Java

I have a problem which I believe is the classic master/worker pattern, and I'm seeking advice on implementation. Here's what I currently am thinking about the problem:
There's a global "queue" of some sort, and it is a central place where "the work to be done" is kept. Presumably this queue will be managed by a kind of "master" object. Threads will be spawned to go find work to do, and when they find work to do, they'll tell the master thing (whatever that is) to "add this to the queue of work to be done".
The master, perhaps on an interval, will spawn other threads that actually perform the work to be done. Once a thread completes its work, I'd like it to notify the master that the work is finished. Then, the master can remove this work from the queue.
I've done a fair amount of thread programming in Java in the past, but it's all been prior to JDK 1.5 and consequently I am not familiar with the appropriate new APIs for handling this case. I understand that JDK7 will have fork-join, and that that might be a solution for me, but I am not able to use an early-access product in this project.
The problems, as I see them, are:
1) how to have the "threads doing the work" communicate back to the master telling them that their work is complete and that the master can now remove the work from the queue
2) how to efficiently have the master guarantee that work is only ever scheduled once. For example, let's say this queue has a million items, and it wants to tell a worker to "go do these 100 things". What's the most efficient way of guaranteeing that when it schedules work to the next worker, it gets "the next 100 things" and not "the 100 things I've already scheduled"?
3) choosing an appropriate data structure for the queue. My thinking here is that the "threads finding work to do" could potentially find the same work to do more than once, and they'd send a message to the master saying "here's work", and the master would realize that the work has already been scheduled and consequently should ignore the message. I want to ensure that I choose the right data structure such that this computation is as cheap as possible.
Traditionally, I would have done this in a database, in sort of a finite-state-machine manner, working "tasks" through from start to complete. However, in this problem, I don't want to use a database because of the high volume and volatility of the queue. In addition, I'd like to keep this as light-weight as possible. I don't want to use any app server if that can be avoided.
It is quite likely that this problem I'm describing is a common problem with a well-known name and accepted set of solutions, but I, with my lowly non-CS degree, do not know what this is called (i.e. please be gentle).
Thanks for any and all pointers.
As far as I understand your requirements, you need ExecutorService. ExecutorService have
submit(Callable task)
method which return value is Future. Future is a blocking way to communicate back from worker to master. You could easily expand this mechanism to work is asynchronous manner. And yes, ExecutorService also maintaining work queue like ThreadPoolExecutor. So you don't need to bother about scheduling, in most cases. java.util.concurrent package already have efficient implementations of thread safe queue (ConcurrentLinked queue - nonblocking, and LinkedBlockedQueue - blocking).
Check out java.util.concurrent in the Java library.
Depending on your application it might be as simple as cobbling together some blocking queue and a ThreadPoolExecutor.
Also, the book Java Concurrency in Practice by Brian Goetz might be helpful.
First, why do you want to hold the items after a worker started doing them? Normally, you would have a queue of work and a worker takes items out of this queue. This would also solve the "how can I prevent workers from getting the same item"-problem.
To your questions:
1) how to have the "threads doing the
work" communicate back to the master
telling them that their work is
complete and that the master can now
remove the work from the queue
The master could listen to the workers using the listener/observer pattern
2) how to efficiently have the master
guarantee that work is only ever
scheduled once. For example, let's say
this queue has a million items, and it
wants to tell a worker to "go do these
100 things". What's the most efficient
way of guaranteeing that when it
schedules work to the next worker, it
gets "the next 100 things" and not
"the 100 things I've already
scheduled"?
See above. I would let the workers pull the items out of the queue.
3) choosing an appropriate data
structure for the queue. My thinking
here is that the "threads finding work
to do" could potentially find the same
work to do more than once, and they'd
send a message to the master saying
"here's work", and the master would
realize that the work has already been
scheduled and consequently should
ignore the message. I want to ensure
that I choose the right data structure
such that this computation is as cheap
as possible.
There are Implementations of a blocking queue since Java 5
Don't forget Jini and Javaspaces. What you're describing sounds very like the classic producer/consumer pattern that space-based architectures excel at.
A producer will write the jobs into the space. 1 or more consumers will take out jobs (under a transaction) and work on that in parallel, and then write the results back. Since it's under a transaction, if a problem occurs the job is made available again for another consumer .
You can scale this trivially by adding more consumers. This works especially well when the consumers are separate VMs and you scale across the network.
If you are open to the idea of Spring, then check out their Spring Integration project. It gives you all the queue/thread-pool boilerplate out of the box and leaves you to focus on the business logic. Configuration is kept to a minimum using #annotations.
btw, the Goetz is very good.
This doesn't sound like a master-worker problem, but a specialized client above a threadpool. Given that you have a lot of scavenging threads and not a lot of processing units, it may be worthwhile simply doing a scavaging pass and then a computing pass. By storing the work items in a Set, the uniqueness constraint will remove duplicates. The second pass can submit all of the work to an ExecutorService to perform the process in parallel.
A master-worker model generally assumes that the data provider has all of the work and supplies it to the master to manage. The master controls the work execution and deals with distributed computation, time-outs, failures, retries, etc. A fork-join abstraction is a recursive rather than iterative data provider. A map-reduce abstraction is a multi-step master-worker that is useful in certain scenarios.
A good example of master-worker is for trivially parallel problems, such as finding prime numbers. Another is a data load where each entry is independant (validate, transform, stage). The need to process a known working set, handle failures, etc. is what makes a master-worker model different than a thread-pool. This is why a master must be in control and pushes the work units out, whereas a threadpool allows workers to pull work from a shared queue.

Categories