I am experimenting with Cassandra and Opscenter. In the Opscenterd's log file, I found this line
2015-07-29 16:10:16+0000 [] ERROR: Problem while calling CreateClusterConfController (SingleNodeProvisioningError): Due to a limitation with one-node clusters, OpsCenter will not be able to communicate with the Datastax Agent unless list
en_address/broadcast_address in cassandra.yaml are set to 172.17.42.1. Please ensure these match before continuing.
Because I deployed Cassandra and Opscenter in different Docker containers, I must set listen_address to the container's internal IP (because Cassandra sitting in a container knows nothing about its host) and broadcast_address to the corresponding host's bridge IP. This is the normal setup if you deploy Cassandra on machines behind separate gateways (like AWS EC2 where each instance has a private and a public IP).
Question 1: What exactly is the limitation with one-node cluster?
Question 2: How should I workaround the problem in this case?
Thanks
Question 1: What exactly is the limitation with one-node cluster?
OpsCenter (via underlying python driver) is reading cluster information from Cassandra’s system tables (namely, system.peers and system.local), with most of the information coming from system.peers, including broadcast interfaces for each of the nodes.
However, that table does not contain information about the node itself, only about its peers. When there are no peers, there is no way to get broadcast address from Cassandra itself, and that’s what OpsCenter uses to tie actual Cassandra instances to the internal representation. In this case OpsCenter uses whatever address you specified as a seed (172.17.42.1 here), and when agents report with a different IP (they’re getting Cassandra’s broadcast address via JMX), OpsCenter would discard those messages.
Question 2: How should I workaround the problem in this case?
Try setting local_address in address.yaml to 172.17.42.1, this should do the trick.
Related
I have searched solutions for my usecase but did not get right one, so expecting some nice ideas to explore further.
I have two gemfire (version 8.2) clusters (private and public) each stores 110+ GB data without persisting to diskstore. Private cluster gets data from DB and transmits entries to public through WAN gateway until both clusters are online. I have a usecase where I restart only public cluster but it looses data after that and to populate data back I have to restart private cluster and loading data from DB to private cluster that in turn transmits data through WAN.
I can't populate public cluster from DB as it puts load onto my master DB that will affect other applications.
There are multiple solutions I tried.
First: Exporting dataset from private cluster and then importing to public; but this disconnects private cluster gemfire nodes as it stores large volume of data in each region, also I have limitation on disk space for downloading large volumes of data.
Second: There is a possibility that I will expose a JMX bean from public cluster. I then can run a client program that invokes gemfire function in private cluster which iterates through entries and drops entries into public cluster through JMX, but my organizational infrastrucure doesn't let me expose JMX beans in gemfire nodes.
Third: As like second one, gemfire function can transmits data to public cluster through queue which seems to be working but has its own limitations. Queue can only transfer text message of 1MB due to which I need to specially handle large objects and also data transfer includes unnecessary serialization and deserialization (JSON text message).
Is there anyway that I can ask private cluster to re-transmit all data through WAN gateway or any other solution someone can propose me to explore.
You can try "gemtouch" in this open source project gemfire-toolkit.
It sounds very similar to idea 2 but it doesn't require exposing a JMX bean. It does use JMX the same way gfsh does. If that's a problem you could easily remove the use of JMX as it only uses JMX for retrieving the list of regions.
I have the same problem but working with 3 Geode clusters (each in a different location).
When something weird happens in one the clusters, we would need to recover it using one of the existing 2 remaining clusters:
If we "touch" one of the clusters, that means that all that info will replicate to cluster that needs recovery, but also to the other cluster that is actually OK. Probably that is OK is not causing any damage, but I would appreciate any opinion.
If we keep running traffic on the remaining 2 clusters while in one of them we are running GemTouch I guess some consistency problems between cluster could pop-up, but not sure.
Last topic it is about LICENSE of gemfire-toolkit. Actually, there is no LICENSE file, so I am not 100% sure if the tool can be used.
Does Datastax Cassandra Java Driver provides fallback mechanism for cassandra Connection to next available host?
For example, if cluster has 4 Host nodes and client application is connected to Host-1.Now, When Host-1 is down, is it possible to provide a mechanism that application try to connect to Host-2 and then Host-3 ... & so on !!!
Additionally,
1) Datastax driver provides the facility to write custom Retry policy RetryPolicy(MyCustomRetryPolicy.RETRY_POLIOCY_INSTANCE); but this invokes in case of error (ReadTimeout, writeTimeout, RequestError etc) but at the point, when one node leaves, this doesn't get invoke.
2) Another way, is to add SpeculativeExecutionPolicy withSpeculativeExecutionPolicy(new ConstantSpeculativeExecutionPolicy(10000,2)) but I am not sure if this solves the problem?
Is there any other proper mechanism provided by Java Driver of cassandra or SpeculativeExecutionPolicy is the only option?
Cassandra Java Driver uses first node only to discover the other nodes in the cluster. Then it uses configured policy to connect to nodes - by default it's token-aware/datacenter-aware policy. First part means that drivers "knows" which node is responsible for handling data with given partition key, and 2nd part is aware where the nodes are located. You can of course to customize your policy, but default settings should be ok. More information is in the official docs.
If in your case fallback doesn't happen, then please share more details.
I have an application incorporating a stretched Hazelcast cluster deployed on 2 data centres simultaneously. The 2 data centres are usually both fully functional, but, at times, one of them is taken completely out of the network for SDN upgrades.
What I intend to achieve is to configure the cluster in such a way that each main partition from a DC will have at least 2 backups - one in the other cluster and one in the current one.
For this purpose, checking the documentation pointed me toward the direction of partition groups(http://docs.hazelcast.org/docs/2.3/manual/html/ch12s03.html). Enterprise WAN Replication seemed exactly like the thing we wanted, but, unfortunately, this feature is not available for the free version of Hazelcast.
My configuration is as follows:
NetworkConfig network = config.getNetworkConfig();
network.setPort(hzClusterConfigs.getPort());
JoinConfig join = network.getJoin();
join.getMulticastConfig().setEnabled(hzClusterConfigs.isMulticastEnabled());
join.getTcpIpConfig()
.setMembers(hzClusterConfigs.getClusterMembers())
.setEnabled(hzClusterConfigs.isTcpIpEnabled());
config.setNetworkConfig(network);
PartitionGroupConfig partitionGroupConfig = config.getPartitionGroupConfig()
.setEnabled(true).setGroupType(PartitionGroupConfig.MemberGroupType.CUSTOM)
.addMemberGroupConfig(new MemberGroupConfig().addInterface(hzClusterConfigs.getClusterDc1Interface()))
.addMemberGroupConfig(new MemberGroupConfig().addInterface(hzClusterConfigs.getClusterDc2Interface()));
config.setPartitionGroupConfig(partitionGroupConfig);
The configs used initially were:
clusterMembers=host1,host2,host3,host4
clusterDc1Interface=10.10.1.*
clusterDc2Interface=10.10.1.*
However, with this set of configs at any event triggered when changing the components of the cluster, a random node in the cluster started logging "No member group is available to assign partition ownership" every other second (as here: https://github.com/hazelcast/hazelcast/issues/5666). What is more, checking the state exposed by the PartitionService in JMX revealed that no partitions were actually getting populated, despite the apparently successful cluster state.
As such, I proceeded to replacing hostnames with the corresponding IPs and the configuration worked. The partitions were getting created successfully and no nodes were acting out.
The problem here is that the boxes are created as part of an A/B deployment process and get their IPs automatically from a range of 244 IPs. Adding all those 244 IPs seems like a bit much, even if it would be done programatically from Chef and not manually, because of all the network noise it would entail. Checking at every deployment using a telnet-based client which machines are listening on the hazelcast port also seems a bit problematic, since the IPs will be different from a deployment to another and we would get ourselves into a situation in which a part of the nodes in the cluster will have a certain member list and another part will have a different member list at the same time.
Using hostnames would be the best solution, in my opinion, because we would rely on DNS resolution and wouldn't need to wrap our heads around IP resolution at provisioning time.
Does anyone know of a workaround for the group config issue? Or, perhaps, an alternative to achieve the same behavior?
This is not possible at the moment. Backup groups cannot be designed the way to have a backup of themselves. As a workaround you might be able to design 4 groups but in this case there is no guarantee that one backup will be on each datacenter, at least not without using 3 backups.
Anyhow in general we do not recommend to spread a Hazelcast cluster over multiple datacenters, except for the very specific situation where the DCs are interconnected in a way that it is similar to a LAN network and redundancy is set up.
I have created a pseudo cluster on a vm and I am trying to get my cluster information such as number of nodes, live node, dead etc. by using the Hadoop's API from a Java program.
Unfortunatelly, I am not able to use any of the methods of the FSNamesystem class. It's like I have to cluster discovery using Java client.
same information as we get using http port 50070.
If the following statement will work I mean if I can create the object which should not return null then I can get all the details regarding my cluster
FSNamesystem f= FSNamesystem.getFSNamesystem();
And also how I can inject dependency for NameNode?
I am running a small system that relies on Hazelcast for clustering, distributed computing and messaging in a Multicast mode (Standard config as available in the download). I have a number of server modules that run as "Core" Hazelcast instances and a Java Swing application that is implemented as a Hazelcast "Native Client". This all works well and I would now like to commission the system in production and would hence need to run two separate clusters (dev + prod) and that is where I run into problems.
According to the documentation all you need to is to use separate group names + passwords for the two clusters and I get the impression that the two clusters should sort themselves out automatically!? This appears to work for the server modules but when I try to connect a "Client"-instance to the prod environment, I can see from the logs of one of the server modules in prod that the client appears to connect successfully:
INFO: [prod] received auth from Connection [/192.168.0.2:55863 -> null] live=true,
client=true, type=JAVA_CLIENT, this group name:prod, auth group name:prod,
successfully authenticated
But, the client never shows up as a member of prod. Instead, I find that the client has become a member of the dev environment even though the authentification took place against prod!
Involontary mixing of the two clusters is obviously a giant problem for me and a showstopper. Does anyone know if there is anything that I am doing wrong or if there are any configuration changes that I can do to resolve the problem?
When a client connects to the cluster it never becomes a member of the cluster.
So I suspect that your client did connected to the prod, but somehow in your code you have somewhere something like Hazelcat.getMap() which results in starting a member in that JVM and since the default configuration that this member will use will be same as the dev, this new member will join to your dev cluster.
So in fact you have one client, that is connected to prod and another member that is connected to the dev cluster.
Try to put something through client and see in which cluster those entries are appear?
Am i making sense?