quartz clustering : scheduler actions visible on all nodes - java

I'm having an issue; maybe you can help me.
Basically, I would like to know if :
quartz clustering can have its trigger changed dynamically (i.e. same config on all servers, but at a given point in time, I want to change the cron expression ON A SINGLE SERVER, and see this change propagated on ALL servers).
generally, if changes on a single server are propagated to all other servers (for example if I stop a particular scheduler on a single node, if all nodes stop the scheduler).

Unless, you're in for TerracottaJobStore, you probably utilize the clustering through database. The way it works is that scheduling data, such as Triggers and JobDetail, is saved to the database. All Scheduler nodes synchronize on that persisted data. Therefore, a change to that data from one node is reflected to all nodes.
OTOH, stopping / starting / standby etc. are all management data (as opposed to Triggers and JobDetails). Management data is considered node-specific and does not propagate to other nodes. According to this post it might in the future...

Related

Quartz scheduler maintenance and performance overheads

We are currently evaluating quartz-scheduler to use in our project. For our use case, we need only one time trigger to be fired at some point in future, it need not to be a repeatable or cron trigger.
So in my POC, I'm creating a new simple one time trigger when business event occurs. I can see in clustered environment (using JDBC store of quartz), triggers are being balanced/distributed among multiple nodes.
Desired behaviour is observed from POC, but, from performance standpoint, how expensive will it be if we create a new one time trigger each time when we run at scale. From my understanding, one bottleneck would be bloating of database with triggers, possible solution for database cleanup is to add a background task to cleanup old triggers.
I am interested in hearing about experiences and pain points on maintaining scheduler with our design and any inputs for improving design.
You can safely use one-time triggers and they will be automatically removed by Quartz after they have fired. What happens is that Quartz checks all triggers and determines if these triggers are going to fire at some point in the future. If they do not, Quartz simply removes them from the store because it makes no sense to keep them.
A somewhat similar principle applies to jobs. If a job has no associated triggers, Quartz automatically removes it from the store unless the job has the durability flag set to true.
So in your case, you will probably want to register a bunch of durable jobs and then your app will create one-time triggers for these jobs on as needed basis. The jobs will remain in the store and the triggers will be automatically cleaned up when they are done.

Hazelcast: how to ensure cluster startup is finished

I have a cluster with 3 nodes (in different machines) and I have a "business logic" that use a distributed lock at startup.
Sometimes when there is more latency every node acquires the exclusive lock with success because the cluster isn't already "startup" so each node does not yet see the other.
Subsequently the nodes see each other and the cluster is correctly configured with 3 nodes. I know there is a "MemberShipListener" to capture the event "Member added" so I could execute again the "business logic", but I would to know if there is a method to ensure when the cluster startup is properly finished in order to wait to execute the "business logic" until the cluster is on.
I tried to use hazelcast.initial.wait.seconds but configure the right seconds isn't deterministic and I don't know if this also delay the member join operations.
Afaik, there is no such thing in Hazelcast. As the cluster is dynamic, a node can go and leave at any time, so the cluster is never "complete" or not.
You can, however :
Configure an initial wait, like you described, in order to help with initial latencies
use hazelcast.initial.min.cluster.size to define the minimum number of members hazelcast is waiting for at start
Define a minimal quorum : the minimal number of nodes for the cluster to be considered as useable/healthy (see cluster quorum)
Use the PartitionService to check is the cluster is safe, or if there are pending migrations

Large number of single threaded task queues

At our company we have a server which is distributed into few instances. Server handles users requests. Requests from different users can be processed in parallel. Requests from same users should be executed strongly sequentionally. But they can arrive to different instances due to balancing. Currently we use Redis-based distributed locks but this is error-prone and requires more work around concurrency than business logic.
What I want is something like this (more like a concept):
Distinct queue for each user
Queue is named after user id
Each requests identified by request id
Imagine two requests from the same user arriving at two different instances concurrently:
Each instance put their request id into this user queue.
Additionaly, they both store their request ids locally.
Then some broker takes request id from the top of "some_user_queue" and moves it into "some_user_queue_processing"
Both instances listen for "some_user_queue_processing". They peek into it and see if this is request id they stored locally. If yes, then do processing. If not, then ignore and wait.
When work is done server deletes this id from "some_user_queue_processing".
Then step 3 again.
And all of this happens concurrently for a lot (thousands of them) of different users (and their queues).
Now, I know this sounds a lot like actors, but:
We need solution requiring as small changes as possible to make fast transition from locks. Akka will force us to rewrite almost everything from scratch.
We need production ready solution. Quasar sounds good, but is not production ready yet (more correctly, their Galaxy cluster).
Tops at my work are very conservative, they simply don't want another dependency which we'll need to support. But we already use Redis (for distributed locks), so I thought maybe it could help with this too.
Thanks
The best solution that matches the description of your problem is Redis Cluster.
Basically, the cluster solves your concurrency problem, in the following way:
Two (or more) requests from the same user, will always go to the same instance, assuming that you use the user-id as a key and the request as a value. The value must be actually a list of requests. When you receive one, you will append it to that list. In other words, that is your queue of requests (a single one for every user).
That matching is being possible by the design of the cluster implementation. It is based on a range of hash-slots spread over all the instances.
When a set command is executed, the cluster performs a hashing operation, which results in a value (the hash-slot that we are going to write on), which is located on a specific instance. The cluster finds the instance that contains the right range, and then performs the writing procedure.
Also, when a get is performed, the cluster does the same procedure: it finds the instance that contains the key, and then it gets the value.
The transition from locks is very easy to perform because you only need to have the instances ready (with the cluster-enabled directive set on "yes") and then to run the cluster-create command from redis-trib.rb script.
I've worked last summer with the cluster in a production environment and it behaved very well.

Locking JPA entity hierarchy being updated massively from MDB's in cluster

I need some help in the following situation: imagine a hierarchy of Job entities that have a progress attribute. Some jobs consist of multiple subjobs making a job tree. The progress of these composite jobs is calculated from their subjobs. The bottom jobs progresses are updated periodically and then the whole tree's progress is recalculated bottom-up. Job progresses are updated via JMS messages received. In this case the job is fetched via JPA from the database and the progress is modified and a recursive recalculation is started.
How should I deal with locking if this runs in a cluster? I would like to avoid situations when two subjobs are updated from 0% to 100% and the parent job goes to 50% instead of 100% as both the updates see 0%, 100% and vica versa.
My first thought was using synchronization on the job objects. But this is not ok, because multiple runtime objects may represent the same record in the database.
Could you suggest me a good, efficient way to handle this situation?

Two threads reading from the same table:how do i make both thread not to read the same set of data from the TASKS table

I have a tasks thread running in two separate instances of tomcat.
The Task threads concurrently reads (using select) TASKS table on certain where condition and then does some processing.
Issue is ,sometimes both the threads pick the same task , because of which the task is executed twice.
My question is how do i make both thread not to read the same set of data from the TASKS table
It is just because your code(which is accessing data base)DAO function is not synchronized.Make it synchronized,i think your problem will be solved.
If the TASKS table you mention is a database table then I would use Transaction isolation.
As a suggestion, within a trasaction, set an attribute of the TASK table to some unique identifiable value if not set. Commit the tracaction. If all is OK then the task has be selected by the thread.
I haven't come across this usecase so treat my suggestion with catuion.
I think you need to see some information how does work with any enterprise job scheduler, for example with Quartz
For your use case there is a better tool for the job - and that's messaging. You are persisting items that need to be worked on, and then attempting to synchronise access between workers. There are a number of issues that you would need to resolve in making this work - in general updating a table and selecting from it should not be mixed (it locks), so storing state there doesn't work; neither would synchronization in your Java code, as that wouldn't survive a server restart.
Using the JMS API with a message broker like ActiveMQ, you would publish a message to a queue. This message would contain the details of the task to be executed. The message broker would persist this somewhere (either in its own message store, or a database). Worker threads would then subscribe to the queue on the message broker, and each message would only be handed off to one of them. This is quite a powerful model, as you can have hundreds of message consumers all acting on tasks so it scales nicely. You can also make this as resilient as it needs to be, so tasks can survive both Tomcat and broker restarts.
Whether the database can provide graceful management of this will depend largely on whether it is using strict two-phase locking (S2PL) or multi-version concurrency control (MVCC) techniques to manage concurrency. Under MVCC reads don't block writes, and vice versa, so it is very possible to manage this with relatively simple logic. Under S2PL you would spend too much time blocking for the database to be a good mechanism for managing this, so you would probably want to look at external mechanisms. Of course, an external mechanism can work regardless of the database, it's just not really necessary with MVCC.
Databases using MVCC are PostgreSQL, Oracle, MS SQL Server (in certain configurations), InnoDB (except at the SERIALIZABLE isolation level), and probably many others. (These are the ones I know of off-hand.)
I didn't pick up any clues in the question as to which database product you are using, but if it is PostgreSQL you might want to consider using advisory locks. http://www.postgresql.org/docs/current/interactive/explicit-locking.html#ADVISORY-LOCKS I suspect many of the other products have some similar mechanism.
I think you need have some variable (column) where you keep last modified date of rows. Your threads can read same set of data with same modified date limitation.
Edit:
I did not see "not to read"
In this case you need have another table TaskExecutor (taskId , executorId) , and when some thread runs task you put data to TaskExecutor; and when you start another thread it just checks that task is already executing or not (Select ... from RanTask where taskId = ...).
Нou also need to take care of isolation level for transaсtions.

Categories