I've got an assignment to find out the all possible initiator nodes for a state recording algorithm in the distributed system.
The question that has been given exactly is
"Write a program to find out the all possible initiator nodes for a state recording algorithm in a distributed system.".
I want to mentioned that we have studied Chandy - Lamport's global state recording algorithm on our course of distributed operating system. I wrote a code for Chandy - Lamport's global state recording algorithm for the another assignment.
What does this initiator node signifies? I thought that those nodes who have recorded their corresponding states. Am I right? I've to write the code in java. Please suggest me the approach or an algorithm to follow.
According to the Wikipedia page on the Chandy-Lamport algorithm:
The assumptions of the algorithm are as follows:
There are no failures and all messages arrive intact and only once
The communication channels are unidirectional and FIFO ordered
There is a communication path between any two processes in the system
Any process may initiate the snapshot algorithm
The snapshot algorithm does not interfere with the normal execution of the processes
Each process in the system records its local state and the state of its incoming channels
The algorithm works using marker messages. Each process that wants to
initiate a snapshot records its local state and sends a marker on each
of its outgoing channels. All the other processes, upon receiving a
marker, record their local state, the state of the channel from which
the marker just came as empty, and send marker messages on all of
their outgoing channels. If a process receives a marker after having
recorded its local state, it records the state of the incoming channel
from which the marker came as carrying all the messages received since
it first recorded its local state.
You are using slightly different terminology to the Wikipedia description, but I assume that your "nodes" correspond to the "processes" in the above. Thus an "initiator node" is a simply a node that initiates (requests) a snapshot.
If that is what your terminology means, then with the Chandy-Lamport algorithm, any node could be an initiator node. Hence the answer to the question is "all of them".
But, given the trivial nature of the answer / solution, I suspect that is not what your assignment really means. Either you have left out some context, or the assignment is misstated. I suggest that you ask your instructor.
(Or ... maybe it is a "trick question".)
Related
I have some Java processes(Socket programs) running on different servers, some on the same network and some on different networks. These processes together have the job to maintain a global counter. A client can connect to any of these processes and issue command to increase, decrease or get the counter value. The global counter should be eventually consistent(Network partition can occur and we can recover from it).
The solution I have thought of so far is to maintain a count of increments and decrements on each node for all the nodes. When an increment command is issued on a node, it increments its own local copy of its counts of increments and then broadcasts its increment and decrement count. The nodes that receive this broadcast take the max of the received counts and their local copy of the sender's counts and stores the result as the latest count. When a get command is issued on any node it gives the difference of the sums of all the increments and decrements. I assume this will take care of cases where broadcasts are received out of order and other unreliabilities. I don't want to use any persistence layer.
Is there a better way to implement this?
What protocol should I use to broadcast the counts? Will gossip on UDP work? Any Java libraries that might help?
You may be aware of this design pattern, but it still may be inspiring: https://en.wikipedia.org/wiki/Observer_pattern
You could simply make all of the instances of the program observe all of the other instances, then they will all notify each other if any one changes (check out the diagram in that link).
As far as a Java libraries, check these out, see if any of them make your life easier:
http://mina.apache.org/
http://commons.apache.org/proper/commons-net/
http://hc.apache.org/
It sounds like you need a PNCounter from Akka's Distributed Data library. It uses Gossip to communicate the counter's state to the network. You also have fine grained control over read and write consistency. So, for example, you can do a ReadMajority where "the value will be read and merged from a majority of replicas".
Incidentally, the PNCounter works as you describe, using two distributed counters to maintain increments and decrements.
At our company we have a server which is distributed into few instances. Server handles users requests. Requests from different users can be processed in parallel. Requests from same users should be executed strongly sequentionally. But they can arrive to different instances due to balancing. Currently we use Redis-based distributed locks but this is error-prone and requires more work around concurrency than business logic.
What I want is something like this (more like a concept):
Distinct queue for each user
Queue is named after user id
Each requests identified by request id
Imagine two requests from the same user arriving at two different instances concurrently:
Each instance put their request id into this user queue.
Additionaly, they both store their request ids locally.
Then some broker takes request id from the top of "some_user_queue" and moves it into "some_user_queue_processing"
Both instances listen for "some_user_queue_processing". They peek into it and see if this is request id they stored locally. If yes, then do processing. If not, then ignore and wait.
When work is done server deletes this id from "some_user_queue_processing".
Then step 3 again.
And all of this happens concurrently for a lot (thousands of them) of different users (and their queues).
Now, I know this sounds a lot like actors, but:
We need solution requiring as small changes as possible to make fast transition from locks. Akka will force us to rewrite almost everything from scratch.
We need production ready solution. Quasar sounds good, but is not production ready yet (more correctly, their Galaxy cluster).
Tops at my work are very conservative, they simply don't want another dependency which we'll need to support. But we already use Redis (for distributed locks), so I thought maybe it could help with this too.
Thanks
The best solution that matches the description of your problem is Redis Cluster.
Basically, the cluster solves your concurrency problem, in the following way:
Two (or more) requests from the same user, will always go to the same instance, assuming that you use the user-id as a key and the request as a value. The value must be actually a list of requests. When you receive one, you will append it to that list. In other words, that is your queue of requests (a single one for every user).
That matching is being possible by the design of the cluster implementation. It is based on a range of hash-slots spread over all the instances.
When a set command is executed, the cluster performs a hashing operation, which results in a value (the hash-slot that we are going to write on), which is located on a specific instance. The cluster finds the instance that contains the right range, and then performs the writing procedure.
Also, when a get is performed, the cluster does the same procedure: it finds the instance that contains the key, and then it gets the value.
The transition from locks is very easy to perform because you only need to have the instances ready (with the cluster-enabled directive set on "yes") and then to run the cluster-create command from redis-trib.rb script.
I've worked last summer with the cluster in a production environment and it behaved very well.
We currently have a distributed setup where we are publishing events to SQS and we have an application which has multiple hosts that drains messages from the queue and does some transformation over it and transmits to interested parties. I have a use case where the receiving end point has scalability concerns with the message volume and hence we would like to batch these messages periodically (say every 15 mins) in the application before sending it.
The incoming message rate is around 200 messages per second and each message is no more than 10 KB. This system need not be real time, but would definitely be a good to have and also the order is not important (its okay if a batch containing older messages gets sent first).
One approach that I can think of is maintaining an embedded database within the application (each host) that batches the events and another thread that runs periodically and clears the data.
Another approach could be to create timestamped buckets in a a distributed key-value store (s3, dynamo etc.) where we write the message to the correct bucket based the messages time stamp and we periodically clear the buckets.
We can run into several issues here, since the messages would be out of order a bucket might have already been cleared (can be solved by having a default bucket though), would need to accurately decide when to clear a bucket etc.
The way I see it, at least two components would be required one which does the batching into a temporary storage and another that clears it.
Any feedback on the above approaches would help, also it looks like a common problem are they any existing solutions that I can leverage ?
Thanks
Does Akka 2 provide a way to determine the number of actors of a certain type active at a certain time in the system?
I've was looking for something like
int actorCount = getContext().count(MyActor.class)
OR
Props props = Props.create(MyActor.class, "actorName")
...
int actorCount = getContext().count(props)
OR
getContext().actorSelection("/path/to/actor").count()
I've just started playing with the akka framework in Java, so please bear with me.
In one of my Akka applications (my first, and not one I'd hold up as a shining example of how to write such systems), I implement a low-water / high-water work generation strategy driven by a heartbeat. The low and high water marks are defined in terms of number of work actors active, each of which does one thing and is created by a manager. That manager keeps track of started and as-yet unompleted workers and can respond to requests from the work generator for the current activity count. The response to these inquiries provides the information as to whether work in progress has fallen below the low-water mark and new work should be generated.
It's kind of crude and in a new system I'm working on now the connections between work generation and work execution, as well as checkpoint logging, is done in a more continuous, less "batch-oriented" manner. This system is about to be deployed, so at the moment I can't say for certain how it will perform, but I'm pretty sure it will exhibit better behavior than the earlier system. It's also inherently more complicated that the earlier system in how it generates, performs and records the work it does.
[Going to put this also as a valid answer]
If the actor who needs the count is not the parent and/or cannot retrieve the ActorRefs of the target actors, then the following might be an alternative.
Counting the number of actors of a certain type can be done by sending a "head count" message, holding an ActorRef array, passing each actor. Then each target actor can add it's ActorRef to that list and forward the message. But this depends on the nature of the system and works only if you know beyond any doubt that you don't have any actors spawning up during the "head count".
I'm programming mobile ad hoc network routing protocol in JAVA (using UDP). That routing protocol consists of ring topology (each node as one predecessor node and one successor node).
First, I've combined one transmitter (one thread) and one receiver (one thread) to form one node. But, I'm facing some problems like:
I'd that a third node could listen transmission from one node to another node. Per example,
node A sends a packet to node B, and if node C is in the range of node A then it might listen that transmission too.
I'd set one channel per ring to reduce interference. But, I don't know which java network API mechanism I should use.
I'd have your guidance.
Thank you in advance (sorry for my poor english)!
Per example, node A sends a packet to node B, and if node C is in the range of node A then it might listen that transmission too.
This is expected behavior for wireless ad-hoc network. If C is not destination (according to MAC-address) you can drop received message.
I'd set one channel per ring to reduce interference.
One channel per ring would oppositely increase interference, especially if you expect high load and many messages being routed around. But it is much easier to manage single channel.
You need to think more what is your environment and requirements.
Are you using 802.11 at MAC level?
Do you want reliable guaranteed delivery?