watch multiple nodes on zookeeper with multithreaded system

watch multiple nodes on zookeeper with multithreaded system - java

I have a project which I watch several nodes in several different threads. Now, I noticed that when I watch a node, and it changed, and an event is raised, the watch on a certain node (called A for example) blocks all the other watchers. So only after the watcher on A is finished , the other watcher will return to watch the nodes changes. Meaning, if a node is changed (called B for example) while its watcher is blocked, only after the watcher on A is finished, the watcher on node B will raise the event.
This problem causes the application be slower.
So, in order to fix this, I wanted to use a different client connnection for each thread, (using curator), but I have read that one connection should be enough, and if I need more than one, there is something worng with my implementation.
1) I dont understand what is the problem with multiple connection to zookeeper server
2) Is there another solution for my problem?
Edit - more specific about the problem
I have a master which gets requests from clients ( and each clients can save files on my server and we do some process on this file, it is more complex than it sounds, i wont elaborate it), and the master creates a node in /tasks/ for a worker to process the file (without its data of course, the data is in a db). When the worker watches his node, he processes the file, and when he finishes, he creates a node in /status (which has all the files that their process were finished).
The master watches the node /status , and when something was changed , it gets the children, and creates a thread (in order to make all faster, because zookeeper watchers and callbacks are single threaded ), which will release those files (remove some meta from db, return a response to client, remove some variable etc.).
That is one of the main flows, but I have another important parts of the code which listen on nodes, and process their children when there are changes.
Because this thing is in a thread, I created a list of the nodes that were finished already, so I wont do the final process more than once, but it was more complex than that, and that solution caused other problems, some concurrency bugs.
So as i asked
1) What is the problem with multiple connection , for each important flow, so i wont have to create threades inside watches and callback?
2) Is there another solution i can use here?

It's not well documented, but ZooKeeper has a single thread for handling watchers and async callbacks. We wrote a tech note for Curator about it. https://cwiki.apache.org/confluence/display/CURATOR/TN1

Related

AMQ Consumers choose which queue to process

Given the following scenario:
I have a system that creates, updates and deletes records. For each of these actions I need to do something (lets say write the events to a log as a silly example) however I need to process these events for each record in order - Meaning I can't log the delete before I have done the create, or any of the previous updates. I also can't log the update before I have logged the create.
I am investigating Queues in order to preserve sequence. However I don't really want RecordID_2 to be held up behind RecordID_14 The records do not need to be processed in sequence as much as the actions on each record have to. Hence I don't think I can/should use one queue.
As I don't have hundreds of RecordID_XX active at the same time, I was thinking of having a queue for each RecordID_XX so if several updates can in for that one RecordID each event for that record would be added to that same queue and be processed in order (I.e. Create first, Update_1 after Create is complete, Update_2 is processed after Update_1 is complete etc), however if additional events for a different record came in they would be added to their own queue. If the queue is empty for a period of time it simply gets deleted. I realize that this may result in a queue getting one message and then being deleted as there were no updates before the idle timeout expired. (This does not seem at all efficient)
Based on Andres T Finnell's excellent answer to this question.
I was thinking of doing the following
Producer (Web Service) -> Queue_Original <- Dispatcher -> RecordID_14
-> RecordID_2
-> RecordID_8
-> RecordID_15
Some of the "logging" may take long. So I want to be able to have a few consumers listening for these queues.
Lets say I have Consumer_1 and Consumer_2 (I may want to add Consumer_3 later to assist with growing load)
What I would like is Consumer_1 to do a getDistinations()
where the broker will return [RecordID_14, RecordID_2, RecordID_8, RecordID_15]
Questions:
Is it possible for Consumer_1 to iterate through the list of queues returned from the broker looking for the first available queue that does not have a Consumer_X connected to it and begin processing the 1st message on this queue?
And then each subsequent Consumer to do the same until it finds the next queue without a Consumer connected to it?
Would Advisory-Messages be the thing to use here?
Am I going down the wrong path completely? Is there a better approach
to handling this scenario?

Large number of single threaded task queues

At our company we have a server which is distributed into few instances. Server handles users requests. Requests from different users can be processed in parallel. Requests from same users should be executed strongly sequentionally. But they can arrive to different instances due to balancing. Currently we use Redis-based distributed locks but this is error-prone and requires more work around concurrency than business logic.
What I want is something like this (more like a concept):
Distinct queue for each user
Queue is named after user id
Each requests identified by request id
Imagine two requests from the same user arriving at two different instances concurrently:
Each instance put their request id into this user queue.
Additionaly, they both store their request ids locally.
Then some broker takes request id from the top of "some_user_queue" and moves it into "some_user_queue_processing"
Both instances listen for "some_user_queue_processing". They peek into it and see if this is request id they stored locally. If yes, then do processing. If not, then ignore and wait.
When work is done server deletes this id from "some_user_queue_processing".
Then step 3 again.
And all of this happens concurrently for a lot (thousands of them) of different users (and their queues).
Now, I know this sounds a lot like actors, but:
We need solution requiring as small changes as possible to make fast transition from locks. Akka will force us to rewrite almost everything from scratch.
We need production ready solution. Quasar sounds good, but is not production ready yet (more correctly, their Galaxy cluster).
Tops at my work are very conservative, they simply don't want another dependency which we'll need to support. But we already use Redis (for distributed locks), so I thought maybe it could help with this too.
Thanks

The best solution that matches the description of your problem is Redis Cluster.
Basically, the cluster solves your concurrency problem, in the following way:
Two (or more) requests from the same user, will always go to the same instance, assuming that you use the user-id as a key and the request as a value. The value must be actually a list of requests. When you receive one, you will append it to that list. In other words, that is your queue of requests (a single one for every user).
That matching is being possible by the design of the cluster implementation. It is based on a range of hash-slots spread over all the instances.
When a set command is executed, the cluster performs a hashing operation, which results in a value (the hash-slot that we are going to write on), which is located on a specific instance. The cluster finds the instance that contains the right range, and then performs the writing procedure.
Also, when a get is performed, the cluster does the same procedure: it finds the instance that contains the key, and then it gets the value.
The transition from locks is very easy to perform because you only need to have the instances ready (with the cluster-enabled directive set on "yes") and then to run the cluster-create command from redis-trib.rb script.
I've worked last summer with the cluster in a production environment and it behaved very well.

Thinking in node if Java background [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am a Java Developer where everything is working in sequential way (or concurrently with multiple threads), one after another. And it's logical to place things in a sequential way.
But node works in concurrent order with single thread. How it can be beneficial even if it is working only on single thread?
Frankly telling, I didn't get the concept of single thread in node. Only one thread handle everything?
Any advice would be beneficial on how I can start thinking in node.

Synchronous Programming(Java)
If you are familiar with synchronous programming (writing code that does one thing after the other) like Java or .Net. take the following code,
For example:
var fs = require('fs');
var content = fs.readFileSync('simpleserver1.js','utf-8');
console.log('File content: ');
console.log(content);
It writes out the code for a simple web server to the console. The code works sequentially, executing each line after the next. The next line is not executed until the previous line finishes executing.
Although this works well,
what if the file in this example were really large and took minutes
to read from?
How could other operations be executed while that code or long
operation is running?
These questions will not arise if you are working in java, because you have many threads to work for you(to serve multiple requests)
Asynchronous Programming(Node.Js)
But when you are using Node you just have a single thread, which serves all requests.
So there comes asynchronous programming, to help you in javascript(Node)
To execute operations while other long operations are running, we use function callbacks. The code below shows how to use an asynchronous callback function:
var fs = require('fs');
fs.readFile('simpleserver1.js','utf-8', function(err,data){
if (err) {
throw err;
}
console.log(“executed from the file finishes reading”);
});
//xyz operation
Notice that the line “executed from the file finishes reading” is executed as the file is being read, thus allowing us to perform other operations while the main reading of the file is also being executed.
Now look at the //xyz operation, in the code. when the file is being read, the server will not wait for the file to be read completely. it will just start executing //xyz operation, and will get back to , the callback function provided in fs.readFile(, when the file is ready.
So thats how Asynchronous programming works in Node.
Also if you want to conpare java and Node you can read this Article
EDIT:
How is node.Js single Threaded
lets take a scenario, where clients request server:
Assumptions:
1) there is single server process, say serverProcess,
2) There are 2 clients requesting server, say clientA and clientB.
3) Now, consider clientA, is going to require a file Operation(as one
shown above using fs).
what happens here,
Flow:
1) clientA requests serverProcess, server gets the request, then
it starts performing file operation. Now it waits till the file is
ready to read(callback is not yet invoked yet).
2) clientB requests serverProcess, Now the server is free right
now, as it is not serving clientA, so it servs clientB, in the
mean-time, the callback from fs.read, Notifies the server that file
data is ready, and it can perform operations on it.
3) Now server starts serving 'clientA'.
now you see, there was just one thread of server , which handled both the client requests, right?
Now what would have happened if this was JAVA, you would have created another thread of server for serving clientB, while clientA was being served by first thread, and waiting for file to be read. So this is how Node is single threaded, meaning A single Process Handles all the requests.
Question:
while there is another process invoked who prepared data from file system, how would you say node is single threaded:
See, I/O(files/database), is itself a different process, what difference here is,
1) Node does not wait for everything to be ready(like java), instead it will just start its next work(or serve other requests), but whatever happens, node will not create a different thread to serve rest of the requests(unless explicitly done//not recommended though).
2) while java will create another thread itself for serving new requests.

This has been said million times, but let me give you a short answer with respect to Java.
You create separate Thread in Java if you want to read a long file, without blocking main thread.
In Javascript, you just read the file using callbacks.
Main difference between those two:
It is easier to screw up the code with multiple threads (race condition, etc).
You do not need exactly the power of CPU's second core to read the file, it is a question of slow I/O, not intensive communication.
In callbacks, there is single thread as you said. Though, it just asks underlying system to read the file, and continues executing your code. Once the file is read, then javascript pauses the code it was executing, and will come back to run your Callback.
Sometimes, you also have to do computationally intensive stuff in Javascript. In that case you can spawn a new process - look into cluster module. But usually, computationally, or I/O heavy operations are already done for you, and you just use them using callbacks.

Ok giving you a head start. It is not about threads its about tasks per second. In a thread mode threads block when they wait for something.
In a non-blocking design everytime you wait for something you just give the thread back and be awaken if the event you are waiting for occured. Those events are known as future. So as in the future i want to do this when this and that has happend (or in a failure case do this other thing). Thats basically it.
It is not node or javascript. It is famous for scala too and sure there are plenty of other languages. And if you are a Java guy look for async processing. Jetty provides it. Vertx is famous for a share nothing architecture.
So have fun with this. I use it regularly. I have a server storing 20GB of data in a custom datastore. Wanna know how we scaled? We brought 512GB for the server and did 20 of those stores in parallel sharing nothing. Its like having 20 servers in one machine with no noticable latency and you scale with the cores. Thats how we do business in todays world.
Hardware is cheap so why fiddle with concurrency on the lowest level?

Cassandra - monitoring connectivity between multiple data centers

Let's assume we have 3 geographically distributed data centers A,B,C. In each of these, a Cassandra cluster is up and running. Now assume DC A can no longer gossip with B and C.
Writes to A with LOCAL_QUORUM, would still be satisfied - but they would no longer be propagated to B and C; and vice-versa.
This situation could have some very disastrous consequences...
What I'm looking for are some tips on how to rapidly ascertain that DC A has become 'isolated' from the other data centers (using the Native Java driver).
I remember reading about push notifications, but I seem to recall they referred only to the status of the local cluster. Does anybody have any ideas? Thanks.

First thing to note is that in the event that A can no longer connect to B and C, Hints will be stored and delivered upon the restoration of the network connection. So for outages that do not last for a long period of time there is already a safety mechanism and you don't need to do anything.
For longer outages it has been best practice to use the repair command following such an outage to synchronize the replicas.
That said, if you are looking for way to determine when inter DC communication has been disrupted you have several options.
1) Use a tool like Datastax Opscenter to monitor your cluster state, this tool will automatically discover when these sorts of events happen and log them. I also believe you are able to set up triggered events but i'm not an expert in how Opscenter works.
2) Use the Java driver's public Cluster register(Host.StateListener listener) to register a function to be called on node down events, you can then determine when entire DC's go down.
3) Track via JMX on each of the DCs the current state of gossip, this will allow you to see what each Datacenter thinks about the current availability of all the machines. You could do this directly or via nodetool status.

#RussS .. I dont think point (2) works when all three host are not reachable ..
For example ..I Implemented state listener and i am poining to my cluster from my local machine .. I can see that listener gets invoked when nodes go up/down .. But i dont see this listener being invoked when I unplug my ether

Java Async Processing

I am currently developing a system that uses allot of async processing. The transfer of information is done using Queues. So one process will put info in the Queue (and terminate) and another will pick it up and process it. My implementation leaves me facing a number of challenges and I am interested in what everyone's approach is to these problems (in terms of architecture as well as libraries).
Let me paint the picture. Lets say you have three processes:
Process A -----> Process B
|
Process C <-----------|
So Process A puts a message in a queue and ends, Process B picks up the message, processes it and puts it in a "return" queue. Process C picks up the message and processes it.
How does one handle Process B not listening or processing messages off the Queue? Is there some JMS type method that prevents a Producer from submitting a message when the Consumer is not active? So Process A will submit but throw an exception.
Lets say Process C has to get a reply with in X minutes, but Process B has stopped (for any reason), is there some mechanism that enforces a timeout on a Queue? So guaranteed reply within X minutes which would kick off Process C.
Can all of these matters be handled using a dead letter Queue of some sort? Should I maybe be doing this all manually with timers and check. I have mentioned JMS but I am open to anything, in fact I am using Hazelcast for the Queues.
Please note this is more of a architectural question, in terms of available java technologies and methods, and I do feel this is a proper question.
Any suggestions will be greatly appreciated.
Thanks

IMHO, The simplest solution is to use an ExecutorService, or a solution based on an executor service. This supports a queue of work, scheduled tasks (for timeouts).
It can also work in a single process. (I believe Hazelcast supports distributed ExecutorService)

It seems to me that the type of questions you're asking are "smells" that queues and async processing may not be the best tools for your situation.
1) That defeats a purpose of a queue. Sounds like you need a synchronous request-response process.
2) Process C is not getting a reply generally speaking. It's getting a message from a queue. If there is a message in the queue and the Process C is ready then it will get it. Process C could decide that the message is stale once it gets it, for example.

I think your first question has already been answered adequately by the other posters.
On your second question, what you are trying to do may be possible depending on the messaging engine used by your application. I know this works with IBM MQ. I have seen this being done using the WebSphere MQ Classes for Java but not JMS. The way it works is that when Process A puts a message on a queue, it specifies the time it will wait for a response message. If Process A fails to receive a response message within the specified time, the system throws an appropriate exception.
I do not think there is a standard way in JMS to handle request/response timeouts the way you want so you may have to use platform specific classes like WebSphere MQ Classes for Java.

Well, kind of the point of queues is to keep things pretty isolated.
If you're not stuck on any particular tech, you could use a database for your queues.
But first, a simple mechanism to ensure two processes are coordinated is to use a socket. If practical, simply have process B create an open socket listener on some well know port, and process A will connect to that socket, and monitor it. If process B ever goes away, process A can tell because their socket gets shutdown, and it can use that as an alert of problems with process B.
For the B -> C problem, have a db table:
create table queue (
id integer,
payload varchar(100), // or whatever you can use to indicate a payload
status varchar(1),
updated timestamp
)
Then, Process A puts its entry on the queue, with the current time and a status of "B". B, listens on the queue:
select * from queue where status = 'B' order by updated
When B is done, it updates the queue to set the status to "C".
Meanwhile, "C" is polling the DB with:
select * from queue where status = 'C'
or (status = 'B' and updated < (now - threshold) order by updated
(with the threshold being however long you want things to rot on the queue).
Finally, C updates the queue row to 'D' for done, or deletes it, or whatever you like.
The dark side is there is a bit of a race condition here where C might try and grab an entry while B is just starting up. You can probably get through that with a strict isolation level, and some locking. Something as simply as:
select * from queue where status = 'C'
or (status = 'B' and updated < (now - threshold) order by updated
FOR UPDATE
Also use FOR UPDATE for B's select. This way whoever win the select race will get an exclusive lock on the row.
This will get you pretty far down the road in terms of actual functionality.

You are expecting the semantics of synchronous processing with async (messaging) setup which is not possible. I have worked on WebSphere MQ and normally when the consumer dies, the messages are kept in the queue forever (unless you set the expiry). Once the queue reaches its depth, the subsequent messages are moved to the dead letter queue.

I've used a similar approach to create a queuing and processing system for video transcoding jobs. Basically the way it worked was:
Process A posts a "schedule" message to Arbiter Q, which adds the job into its "waiting" queue.
Process B requests the next job from Arbiter Q, which removes the next item in its "waiting" queue (subject to some custom scheduling logic to ensure that a single user couldn't flood transcode requests and prevent other users from being able to transcode videos) and inserts it into its "processing" set before returning the job back to Process B. The job is timestamped when it goes into the "processing" set.
Process B completes the job and posts a "complete" message to Arbiter Q, which removes the job from the "processing" set and then modifies some state so that Process C knows the job completed.
Arbiter Q periodically inspects the jobs in its "processing" set, and times out any that have been running for an unusually long amount of time. Process A is then free to attempt to queue up the same job again, if it wants.
This was implemented using JMX (JMS would have been much more appropriate, but I digress). Process A was simply the servlet thread which responded to a user-initiated transcode request. Arbiter Q was an MBean singleton (persisted/replicated across all the nodes in a cluster of servers) that received "schedule" and "complete" messages. Its internally managed "queues" were simply List instances, and when a job completed it modified a value in the application's database to refer to the URL of the transcoded video file. Process B was the transcoding thread. Its job was simply to request a job, transcode it, and then report back when it finished. Over and over again until the end of time. Process C was another user/servlet thread. It would see that the URL was available, and present the download link to the user.
In such a case, if Process B were to die then the jobs would sit in the "waiting" queue forever. In practice, however, that never happened. If your Process B is not running/doing what it is supposed to do then I think that suggests a problem in your deployment/configuration/implementation of Process B more than it does a problem in your overall approach.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.