Ensure replication between data centres with Hazelcast - java

I have an application incorporating a stretched Hazelcast cluster deployed on 2 data centres simultaneously. The 2 data centres are usually both fully functional, but, at times, one of them is taken completely out of the network for SDN upgrades.
What I intend to achieve is to configure the cluster in such a way that each main partition from a DC will have at least 2 backups - one in the other cluster and one in the current one.
For this purpose, checking the documentation pointed me toward the direction of partition groups(http://docs.hazelcast.org/docs/2.3/manual/html/ch12s03.html). Enterprise WAN Replication seemed exactly like the thing we wanted, but, unfortunately, this feature is not available for the free version of Hazelcast.
My configuration is as follows:
NetworkConfig network = config.getNetworkConfig();
network.setPort(hzClusterConfigs.getPort());
JoinConfig join = network.getJoin();
join.getMulticastConfig().setEnabled(hzClusterConfigs.isMulticastEnabled());
join.getTcpIpConfig()
.setMembers(hzClusterConfigs.getClusterMembers())
.setEnabled(hzClusterConfigs.isTcpIpEnabled());
config.setNetworkConfig(network);
PartitionGroupConfig partitionGroupConfig = config.getPartitionGroupConfig()
.setEnabled(true).setGroupType(PartitionGroupConfig.MemberGroupType.CUSTOM)
.addMemberGroupConfig(new MemberGroupConfig().addInterface(hzClusterConfigs.getClusterDc1Interface()))
.addMemberGroupConfig(new MemberGroupConfig().addInterface(hzClusterConfigs.getClusterDc2Interface()));
config.setPartitionGroupConfig(partitionGroupConfig);
The configs used initially were:
clusterMembers=host1,host2,host3,host4
clusterDc1Interface=10.10.1.*
clusterDc2Interface=10.10.1.*
However, with this set of configs at any event triggered when changing the components of the cluster, a random node in the cluster started logging "No member group is available to assign partition ownership" every other second (as here: https://github.com/hazelcast/hazelcast/issues/5666). What is more, checking the state exposed by the PartitionService in JMX revealed that no partitions were actually getting populated, despite the apparently successful cluster state.
As such, I proceeded to replacing hostnames with the corresponding IPs and the configuration worked. The partitions were getting created successfully and no nodes were acting out.
The problem here is that the boxes are created as part of an A/B deployment process and get their IPs automatically from a range of 244 IPs. Adding all those 244 IPs seems like a bit much, even if it would be done programatically from Chef and not manually, because of all the network noise it would entail. Checking at every deployment using a telnet-based client which machines are listening on the hazelcast port also seems a bit problematic, since the IPs will be different from a deployment to another and we would get ourselves into a situation in which a part of the nodes in the cluster will have a certain member list and another part will have a different member list at the same time.
Using hostnames would be the best solution, in my opinion, because we would rely on DNS resolution and wouldn't need to wrap our heads around IP resolution at provisioning time.
Does anyone know of a workaround for the group config issue? Or, perhaps, an alternative to achieve the same behavior?

This is not possible at the moment. Backup groups cannot be designed the way to have a backup of themselves. As a workaround you might be able to design 4 groups but in this case there is no guarantee that one backup will be on each datacenter, at least not without using 3 backups.
Anyhow in general we do not recommend to spread a Hazelcast cluster over multiple datacenters, except for the very specific situation where the DCs are interconnected in a way that it is similar to a LAN network and redundancy is set up.

Related

Transferring data through clusters using gemfire

I have searched solutions for my usecase but did not get right one, so expecting some nice ideas to explore further.
I have two gemfire (version 8.2) clusters (private and public) each stores 110+ GB data without persisting to diskstore. Private cluster gets data from DB and transmits entries to public through WAN gateway until both clusters are online. I have a usecase where I restart only public cluster but it looses data after that and to populate data back I have to restart private cluster and loading data from DB to private cluster that in turn transmits data through WAN.
I can't populate public cluster from DB as it puts load onto my master DB that will affect other applications.
There are multiple solutions I tried.
First: Exporting dataset from private cluster and then importing to public; but this disconnects private cluster gemfire nodes as it stores large volume of data in each region, also I have limitation on disk space for downloading large volumes of data.
Second: There is a possibility that I will expose a JMX bean from public cluster. I then can run a client program that invokes gemfire function in private cluster which iterates through entries and drops entries into public cluster through JMX, but my organizational infrastrucure doesn't let me expose JMX beans in gemfire nodes.
Third: As like second one, gemfire function can transmits data to public cluster through queue which seems to be working but has its own limitations. Queue can only transfer text message of 1MB due to which I need to specially handle large objects and also data transfer includes unnecessary serialization and deserialization (JSON text message).
Is there anyway that I can ask private cluster to re-transmit all data through WAN gateway or any other solution someone can propose me to explore.
You can try "gemtouch" in this open source project gemfire-toolkit.
It sounds very similar to idea 2 but it doesn't require exposing a JMX bean. It does use JMX the same way gfsh does. If that's a problem you could easily remove the use of JMX as it only uses JMX for retrieving the list of regions.
I have the same problem but working with 3 Geode clusters (each in a different location).
When something weird happens in one the clusters, we would need to recover it using one of the existing 2 remaining clusters:
If we "touch" one of the clusters, that means that all that info will replicate to cluster that needs recovery, but also to the other cluster that is actually OK. Probably that is OK is not causing any damage, but I would appreciate any opinion.
If we keep running traffic on the remaining 2 clusters while in one of them we are running GemTouch I guess some consistency problems between cluster could pop-up, but not sure.
Last topic it is about LICENSE of gemfire-toolkit. Actually, there is no LICENSE file, so I am not 100% sure if the tool can be used.

Pivotal gemfire cluster configuration

I am trying to set up a Pivotal Gemfire cluster with two nodes/hosts. Precisely two different unix servers. The idea behind is creating 1 locator and 1 cache server in each host where the locators should take care of load balancing among the cache servers. A replicated region will be created in both the cache servers. When a client creates/update a region in cache server using gfsh or java API, it should be replicated to other one
Using gfsh, I am able to start a locator (locator 1) and a cache server (server 1) in host_A and likewise in host_B. I have created a region (RegionA) in both the servers.
Is that all i have to do ?. Pivotal tutorials talk about having a locator and multiple cache servers in same machine. I could not find any appropriate resource which talks about multi-server/host configuration.
After starting the servers in both the hosts. I am starting servers in each of the host like this.
start server --name=server1 --locators=host_A[10334],host_B[10334] --group=group1 --server-port=40406
start server --name=server2 --locators=host_A[10334],host_B[10334] --group=group1 --server-port=40406
When i do "list members" in gfsh, host B shows (locator 2, server 1 [from host A], server 2), but host A shows locator 1 only. Ideally i am expecting 2 locator s and 2 servers as members in both the machines. Is that not right?
The steps look just fine, are you having any issues or something is not working while using the started cluster?. You can go through Pivotal GemFire in 15 Minutes or Less to get to know how to start locators and servers, and how to interact with them as well. The only extra item I can think of (not mentioned withint he previous link as all members are started locally within the same gfsh session) is that you need to correctly configure the --locators parameter when starting your members, more information about how this works can be found in How Member Discovery Works and Configuring Peer-to-Peer Discovery.
Just for your reference, you can have as many members as you want per host, there's no implicit limit about this other than the actual physical resources on the host itself (memory, disk, ports, network throughput, etc.). Keep in mind, however, that it is always better to have only one member per host to achieve the highest reliability and availability for both your data and locator services.
Hope this helps, cheers.

Emulating multipe nodes on single node in Java?

I evaluate methods to setup and simulate multiple nodes of a virtual cluster on a single computer for cluster emulation in Java.
I might be able to assign virtual hosts /ips and spawn multiple JVM child processes but what is the best way to test cluster behavior to setup and test that kind of behavior.
Every idea to do so is appriciated.
Can I use the same ports for different local ip alias all mapping to a single localhost?
[update]
To give you an idea:
Laptop Dual Core, 8GB, 120GB SSD.
Virtual IPs: 127.0.0.2, ... 127.0.0.11
Now I would love to be able to start java child processes like:
java -jar node.jar + args = 127.0.0.2 for a node using 127.0.0.2 IP only.
A fall back would be to use different base ports for communication but this will introduce an additional layer and it would feel like testing independent node services running on the same node since it will not envolve the usual cluster detection and forming.
if a simulation environment is an option for you how about trying cloudsim... with a quick look at example 2 and the rest you might find what you are looking for and its java as what you need to import your code.
It turned out that jGroups has everything I need right build in. Just create multiple channel with the same name and live with the logic identity (address) and use other means of identify the actual nodes than IP.
I started and simulated about 1000 nodes each joining three or four channels. It takes about <10ms to join a channel. I even managed to use my four cores to parallelize it. So it takes two to three seconds to start the test. By also forming an alternative channel and disconnect at once from the main channel I was even able to check split brain behavior.

Who will take care of data in Queues in case of a crash

we are using a set of Active MQ servers (three ) behind a load balancer .
These configured queues will Persist the Data to a disk (For helping in case of a crash )
My question is Does a developer or MQ admin will take care of these things
Thanks
If the messages are REALLY important, you might think about replication of them. Once persisted to the disk, replicate them on some other machine also. That is minimum what you should do - not keep messages on the same machine. You should be looking at distributed queues:
Distributed Queue
Who's responsibility it is? Well, you companies, the people who design and build the solution. It's everyone's. If you can do it (and I am sure you can try at least), then go ahead.
IMHO in your case the ActiveMQ part needs to be done by developer, and the replication on the Server side by an admin, not necessarily an MQ Admin, but the admin. May be set up a cron job to replicate the needed data?
Cheers,Eugene.
Your setup is as secure as the weakest element of safety. You can loose messages when one server crash (disks). You will not be able to recover messages so You should take care for safety in app.
ActiveMQ can be more safe (but slower). Replicated Message Stores
Look here http://activemq.apache.org/clustering.html

EC2 ELB performance issues

Two questions about EC2 ELB:
First is how to properly run JMeter tests. I've found the following http://osdir.com/ml/jmeter-user.jakarta.apache.org/2010-04/msg00203.html, which basically says to set -Dsun.net.inetaddr.ttl=0 when starting JMeter (which is easy) and the second point it makes is that the routing is per ip not per request. So aside from starting a farm of jmeter instances I don't see how to get around that. Any ideas are welcome, or possibly I'm mis-reading the explanation(?)
Also, I have a web service that is making a server side call to another web service in java (and both behind ELB), so I'm using HttpClient and it's MultiThreadedHttpConnectionManager, where I provide some large-ish routes to host value in the connection manager. And I'm wondering if that will break the load balancing behavior ELB because the connections are cached (and also, that the requests all originate from the same machine). I can switch to use a new HttpClient each time (kind of lame) but that doesn't get around the fact that all requests are originating from a small number of hosts.
Backstory: I'm in the process of perf testing a service using ELB on EC2 and the traffic is not distributing evenly (most traffic to 1-2 nodes, almost no traffic to 1 node, no traffic at all to a 4th node). And so the issues above are the possible culprits I've identified.
I have had very simular problems. One thing is the ELB does not scale well under burst load. So when you are trying to test it, it is not scaling up immediately. It takes a lot of time for it to move up. Another thing that is a drawback is the fact that it uses a CNAME as the DNS look up. This alone is going to slow you down. There are more performance issues you can research.
My recommendation is to use haproxy. You have much more control, and you will like the performance. I have been very happy with it. I use heartbeat to setup a redundant server and I am good to go.
Also if you plan on doing SSL with the ELB, you will suffer more because I found the performance to be below par.
I hope that helps some. When it comes down to it, AWS has told me personally that load testing the ELB does not really work, and if you are planning on launching with a large amount of load, you need to tell them so they can scale you up ahead of time.
You don't say how many jmeter instances you're running, but in my experience it should be around 2x the number of AZs you're scaling across. Even then, you will probably see unbalanced loads - it is very unusual to see the load scaled exactly across your back-end fleet.
You can help (a bit) by running your jmeter instances in different regions.
Another factor is the duration of your test. ELBs do take some time to scale up - you can generally tell how many instances are running by doing an nslookup against the ELB name. Understand your scaling patterns, and build tests around them. (So if it takes 20 minutes to add another instance to the ELB pool, include a 25-30 minute warm-up to your test.) You also get AWS to "pre-warm" the ELB pool if necessary.
If your ELB pool size is sufficient for your test, and can verify that the pool does not change during a test run, you can always try running your tests directly against the ELB IPs - i.e. manually balancing the traffic.
I'm not sure what you expect to happen with the 2nd tier of calls - if you're opening a connection, and re-using it, there's obviously no way to have that scaled across instances without closing & re-opening the connection. Are these calls running on the same set of servers, or a different set? You can create an internal ELB, and use that endpoint to connect to, but I'm not sure that would help in the scenario you've described.

Categories