I am building a system of clustered computers with several nodes. there is a master node that is suppose to schedule task to several nodes in the cluster. the nodes are separate PCs that are connected to the master node via network cables. the whole system is expected to be implemented with java akka and play framework platform.
is there a way to implement this with akka remote clustering with play framework.
I am aware of the remote calculator tutorials but it seems to be runned with the SBT platform
but I will love to know if a similar tutorials exist with the play framework.
Or any link to help me with my project
thank you
An instance of Play! framework application can connect to a remote Akka node (i.e: your master node) using a simple configuration.
There are two ways:
override the default actor system
define a new actor system
I suggest you to use the second one.
In this case you have to add in application.conf something like
master {
akka {
actor {
provider = "akka.remote.RemoteActorRefProvider"
}
remote {
transport = "akka.remote.netty.NettyRemoteTransport"
netty {
hostname = "your-master-host-name"
port = 0
}
}
}
}
Then in your Play! app you can connect to te remote master node in this way
ActorSystem system = ActorSystem.create("master", ConfigFactory.load().getConfig("master"))
ActorRef master = system.actorFor("akka://master#your-master-host-name:your-master-port/user/master")
If you prefer to override de default Play Akka actor system. Here is the reference configuration: http://www.playframework.org/documentation/2.0.3/AkkaCore
For the master and computational cluster nodes I suggest you to use the architecture and the code described here: http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2
If your master and computational nodes does not required a web or REST interface you can implement them as simple Java program.
In the cited article the node are not exposed remotely. To do that just add an application.conf in master node app:
master {
akka {
actor {
provider = "akka.remote.RemoteActorRefProvider"
}
remote {
transport = "akka.remote.netty.NettyRemoteTransport"
netty {
hostname = "your-master-host-name"
port = your-master-port
}
}
}
}
And instantiate it in with actorOf method
ActorSystem system = ActorSystem.create("master", ConfigFactory.load().getConfig("master"))
ActorRef master = system.actorOf(new Props(Master.class), "master")
The computational nodes must be configured in the same way of Play! node.
Notice that only master node has a TCP-IP port defined. Non-master nodes use 0 port, which configure Akka to choose a random free port for them. This is correct because the only well-known host:port address you need is the master one, where every nodes, when its startup, has to point to.
Related
I just implemented a distibuted lock using Apache Curator und ZooKeeper in standalone mode.
I initialzed the CuratorFramework as follows:
CuratorFramework client = CuratorFrameworkFactory.newClient("localhost:2182", retryPolicy);
Everything worked fine, so I tried to use ZooKeeper in cluster mode. I started three instances and initialzed the CuratorFramework as follows:
CuratorFramework client = CuratorFrameworkFactory.newClient("localhost:2182,localhost:2182,localhost:2183", retryPolicy);
As you can see, I just added the addresses of the two new nodes.
So far so good.But how do I initialize the client, when I don't know the addresses of each node respectively the size of the cluster, because I want to scale it dynamically?
I could initialize it by only specifying the address of the first node which will always be started. But if that node goes down, Curator loses the connection to the whole cluster (I just tried it).
CuratorFrameworkFactory has a builder that allows you to specify an EnsembleProvider instead of a connectionString and to include an EnsembleTracker. This will keep your connectionString up to date, but you will need to persist the data somehow to ensure your application can find the ensemble when it restarts. I recommend implementing a decorating EnsembleProvider that encapsulates a FixedEnsembleProvider and writes the config to a properties file.
Example:
EnsembleProvider ensemble = new MyDecoratingEnsembleProvider(new FixedEnsembleProvider("localhost:2182", true));
CuratorFramework client = CuratorFrameworkFactory.builder()
.ensembleProvider(ensemble)
.retryPolicy(retryPolicy)
.ensembleTracker(true)
.build();
You should always know where your Zookeeper instances are. There's no way to connect to something when you don't know where it is - how could you?
If you can connect to any instance, you can get the configuration details and poll it regularly to keep your connection details up-to-date, perhaps?
maybe take a look at https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html#ch_reconfig_rebalancing
In my current setup, I'm using the default multicast option of the Hazelcast cluster manager. When I link the instances of my containerized Vertx modules (via Docker networking links), I can see that they are successfully creating Hazelcast cluster. However, when I try publishing events on the event bus from one module, the other module doesn't react to it. I'm not sure how the network settings in the Hazelcast cluster related to the network settings for the event bus.
At the moment, I have the following programmatic configuration for each of my Vert.x module, each deployed inside a docker container.
ClusterManager clusterManager = new HazelcastClusterManager();
VertxOptions vertxOptions = new VertxOptions()
.setClustered(true)
.setClusterManager(clusterManager);
vertxOptions.setEventBusOptions(new EventBusOptions()
.setClustered(true)
.setClusterPublicHost("application"));
The Vert.x Core manual states that I may have to configure clusterPublicHost, and clusterPublicPort for the event bus, but I'm not sure how those relate to the general network topology.
One answer is here https://groups.google.com/d/msg/vertx/_2MzDDowMBM/nFoI_k6GAgAJ
I see this question come up a lot, and what a lot of people miss in
the documentation (myself included) is that Event Bus does not use the
cluster manager to send event bus messages. I.e. in your example with
Hazelcast as the cluster manager, you have the Hazelcast cluster up
and communicating properly (so your Cluster Manager is fine); however,
the Event bus is failing to communicate with your other docker
instances due to one or more of the following:
It is attempting to use an incorrect IP address to the other node (i.e. the IP of the private interface on the Docker instance, not the
publicly mapped one)
It is attempting to communicate on a port Docker is not configured to forward (the event bus picks a dynamic port if you don't specify
one)
What you need to do is:
Tell Vertx the IP address that the other nodes should use to talk to each instance ( using the -cluster-host [command line] ,
setClusterPublicHost [VertXOptions] or "vertx.cluster.public.host"
[System Property] options)
Tell Vertx explicitly the Port to use for event bus communication and ensure Docker is forwarding traffic for those ports ( using the
"vertx.cluster.public.port" [System Property], setClusterPublicPort
[VertXOptions] or -cluster-port [command line] options). In the past,
I have used 15701 because it is easy to remember (just a '1' in fromt
of the Hazelcast ports).
The Event bus only uses the Cluster Manager to manage the IP/Port
information of the other Vertx Instances and the registration of the
Consumers/Producers. The communications are done independently of the
cluster manager, which is why you can have the cluster manager
configured properly and communicating, but still have no Event bus
communications.
You may not need to do both the steps above if both your containers
are running on the same host, but you definitely will once you start
running them on separate hosts.
Something what also can happen, is that vert.x uses the loopback interface, when not specifying the IP which vert.x (not hazelcast) should take to communicate over eventbus. The problem here is, that you don't know which interface is taken to communicate over (loopback, interface with IP, you could even have multiple interfaces with IP).
To overcome this problem, I wrote a method once https://github.com/swisspush/vertx-cluster-watchdog/blob/master/src/main/java/org/swisspush/vertx/cluster/ClusterWatchdogRunner.java#L101
The cluster manager works fine, the cluster manager configuration has to be the same on each node (machine/docker container) in your cluster or don't make any configurations at all (use the default configuration of your cluster manager).
You have to make the event bus configuration be consistent on each node, you have to set the cluster host on each node to be the IP address of this node itself and any arbitrary port number (unless you try to run more than Vert.x instance on the same node you have to choose a different port number for each Vert.x instance).
For example if a node's IP address is 192.168.1.12 then you would do the following:
VertxOptions options = new VertxOptions()
.setClustered(true)
.setClusterHost("192.168.1.12") // node ip
.setClusterPort(17001) // any arbitrary port but make sure no other Vert.x instances using same port on the same node
.setClusterManager(clusterManager);
on another node whose IP address is 192.168.1.56 then you would do the following:
VertxOptions options = new VertxOptions()
.setClustered(true)
.setClusterHost("192.168.1.56") // other node ip
.setClusterPort(17001) // it is ok because this is a different node
.setClusterManager(clusterManager);
found this solution that worked perfectly for me, below is my code snippet (important part is the options.setClusterHost()
public class Runner {
public static void run(Class clazz) {
VertxOptions options = new VertxOptions();
try {
// for docker binding
String local = InetAddress.getLocalHost().getHostAddress();
options.setClusterHost(local);
} catch (UnknownHostException e) { }
options.setClustered(true);
Vertx.clusteredVertx(options, res -> {
if (res.succeeded()) {
res.result().deployVerticle(clazz.getName());
} else {
res.cause().printStackTrace();
}
});
}
}
public class Publisher extends AbstractVerticle {
public static void main(String[] args) {
Runner.run(Publisher.class);
}
...
}
no need to define anything else...
I am running Elasticsearch 2.1.0 on localhost:9200. What I need to do is read from an index using Java for ES, without having to create a node, because I care about the speed of my application. The following is my code:
try (Client client = TransportClient.builder().build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getLocalHost(), 9200))
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getLocalHost(), 9200))) {
QueryBuilder qb = matchQuery(
...
);
CountResponse response;
response = client.prepareCount(indexName)
.setTypes(spammerType).setQuery(qb)
.execute()
.actionGet();
}
However, I am getting the following error:
Exception in thread "main" NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{127.0.0.1}{localhost/127.0.0.1:9200}]]
But I'm trying to avoid creating a node because as I mentioned before, I need my application to be as fast as possible in reading from the index. According to ES:
There are uses-cases for both clients:
The transport client is ideal if you want to decouple your application
from the cluster. For example, if your application quickly creates and
destroys connections to the cluster, a transport client is much
"lighter" than a node client, since it is not part of a cluster.
Similarly, if you need to create thousands of connections, you don’t
want to have thousands of node clients join the cluster. The TC will
be a better choice.
On the flipside, if you need only a few long-lived, persistent
connection objects to the cluster, a node client can be a bit more
efficient since it knows the cluster layout. But it ties your
application into the cluster, so it may pose problems from a firewall
perspective.
How can I fix the error?? Thanks.
Apparently I should run it on port 9300, not 9200:
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getLocalHost(), 9300))
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getLocalHost(), 9300)))
I want to implement the following flow from Java code:
Create a new AWS EMR instance (using AWS SDK)
Connect to the AWS EMR using the Hive JDBC (required IP)
Run my "SQL" queries on the EMR
Destroy the AWS EMR (using AWS SDK)
My problem is that when I create an EMR using the SDK I can only retrieve the AWS id of it, something like j-XXXXXXXXXXX. But in order to connect to the JDBC I need the master node IP. How can I obtain the master node IP from the code?
I'm following this JDBC example page
==UPDATE==
I tried using the AmazonElasticMapReduceClient.describeCluster but could only obtain the public DNS name while I'm looking for the private ip.
AFAIK there is no direct way to get it, but it can be achieved using 2 API calls and searching among them:
public String getMasterNodeIp(AmazonElasticMapReduceClient emr, String emrId) throws Exception {
Cluster cluster = emr.describeCluster(new DescribeClusterRequest().withClusterId(emrId)).getCluster();
ListInstancesResult instances = emr.listInstances(new ListInstancesRequest().withClusterId(emrId));
String masterDnsName = cluster.getMasterPublicDnsName();
for (Instance instance : instances.getInstances()) {
if (instance.getPublicDnsName().equals(masterDnsName)) {
return instance.getPrivateIpAddress();
}
}
throw new Exception("Failed to find master node private ip.");
}
I am trying to learn Cassandra and have setup a 4 node Cassandra cluster. I have written a client in Java using Hector, which currently connects to a hard coded single node in the cluster. Ideally, I would like my client to connect to the "cluster" rather then a specific node....so if any of the 4 nodes are down, the client will still connect to something. From the client application perspective how does this work exactly? I can't seem to find a good explanation.
My Hector connection string currently, I need to specify a specific node here:
Cluster c = getOrCreateCluster("Test Cluster", cassandraNode1:9160);
My Cassandra nodes are all configured with my rpc_address: 0.0.0.0
If you pass a CassandraHostConfigurator to getOrCreateCluster(), you can specify multiple nodes as a comma-separated string:
public CassandraHostConfigurator(String hosts) {
this.hosts = hosts;
}
...
String[] hostVals = hosts.split(",");
CassandraHost[] cassandraHosts = new CassandraHost[hostVals.length];
...
You can also toggle CassandraHostConfigurator#setAutoDiscoverHosts and #setUseAutoDiscoverAtStartup to use your initial host(s) to automatically add all hosts found on via the Thrift API method describe_keyspaces. This makes configuration a little bit easier in that you only need to reference a single host.
Keeping autoDiscover enabled (it is off by default) makes it a bit easier to scale out as new nodes will be added as they are discovered. The ability to add nodes is also available via JMX as well so adding nodes can be done manually at any time, though you would have to do it once per Hector instance.