I have set-up a cluster under Google Kubernetes Engine and tried the GuestBook Redis image (Java). Was able to put a key onto Redis Master, however failing to read the value from the Slave. Tried to read it from Master itself and found the respective key and its value, however read from Slave fails and the reason could be replication not happening.
Tried the approach provided under
page https://cloud.google.com/kubernetes-engine/docs/tutorials/guestbook (tried using JAVA).
I suppose the redis-slave-controller.yaml has the necessary configuration to set the replication, but still it does not work. Could someone please help what could be missing here?
I was using the latest redis4 image (launcher.gcr.io/google/redis4:latest) for both master and slave and it seemed to be causing the replication issue. Could not find the right image for slave for the latest version and hence
I replaced the below images and it is working correctly now.
Redis Master image: gcr.io/google_containers/redis:latest
Redis Slave image: gcr.io/google_containers/redis-slave:v2
Related
Issue
Create an ignite client (in client mode false) and put some data (10k entries/values) to it with very small expiration time (~20s) and TTL enabled.
Each time the thread is running it'll remove all the entries that expired, but after few attempts this thread is not removing all the expired entries, some of them are staying in memory and are not removed by this thread execution.
That means we got some expired data in memory, and it's something we want to avoid.
Please can you confirm that is a real issue or just misuse/configuration of my setup?
Thanks for your feedback.
Test
I've tried in three different setups: full local mode (embedded server) on MacOS, remote server using one node in Docker, and also remote cluster using 3 nodes in kubernetes.
To reproduce
Git repo: https://github.com/panes/ignite-sample
Run MyIgniteLoadRunnerTest.run() to reproduce the issue described on top.
(Global setup: Writing 10k entries of 64octets each with TTL 10s)
It seems to be a known issue. Here's the link to track it https://issues.apache.org/jira/browse/IGNITE-11438. It's to be included into Ignite 2.8 release. As far as I know it has already been released as a part of GridGain Community Edition.
We are trying to use Application level clustering using the Akka Clustering for our distributed application which runs within docker containers across multiple nodes. We plan to run the docker container in the "host" mode networking.
When the dockerized application comes up for the first time, the Akka Clustering does seem to work and we do not see any Gossip messages being exchanged between the cluster nodes. This gets resolved only when we remove the file "/var/lib/docker/network/files/local-kv.db” and restart the docker service. This is not an acceptable solution for the production deployment and so we are trying to do an RCA and provide a proper solution.
Any help here would be really appreciated.
Tried removing file "/var/lib/docker/network/files/local-kv.db” and restarting docker service worked. But this workaround is unacceptable in the production deployment
Tried using the bridge network mode for the dockerized container. That helps, but our current requirement requires us to run the container in "host" mode.
application.conf has the following settings for the host and port currently.
hostname = "" port = 2551 bind-hostname = "0.0.0.0" bind-port = 2551
No gossip messages are exchanged between the akka cluster nodes. Whereas we see those messages after applying the mentioned workaround
For my master thesis I'm trying to set up a flink standalone cluster on 4 nodes. I've worked along the documentation which pretty neatly explains how to set it up. But when I start the cluster there is a warning and when I'm trying to run a job, there is an error with the same message:
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink#MYHOSTNAME:6123/user/jobmanager#-818199108]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.messages.JobManagerMessages$LeaderSessionMessage"
Increasing the timeout didn't work. When I open the taskmanagers in web UI, all of them have the following pattern:
akka.tcp://flink#MYHOSTNAME:33779/user/taskmanager
Does anyone have an idea how to solve this to get the cluster working? Thanks in advance!
One last thing: There isn't a user "flink" on the cluster and won't be created. So any advices without telling me I should create that user would be very appreciated! Thanks!
Not sure if it is still relevant, but the way i did it (using Flink 1.5.3):
I setup a HA standalone cluster with 3 master (JobManager) and 20 slaves (TaskManager) in the following way.
Define your conf/masters file (hostname:8081 per line)
Define your conf/slaves file (each taskmanager hostname per line)
Define in the flink-conf.yaml on each master machine its own jobmanager.rpc.address hostname
Define in the flink-conf.yaml on each slave machine the jobmanager.rpc.address as localhost
Once every is set, execute the bin/start-cluster.sh on any of the master host.
If you need HA then you need to setup a zookeeper quorum and modify the corresponding properties regarding HA (high-availability, high-availability.storageDir, high-availability.zookeeper.quorum)
I'm trying to write a simple spark application, and when i run it locally it works with setting the master as
.master("local[2]")
But after configuring spark cluster on AWS (EMR) i can't connet to the master url:
.master("spark://<master url>:7077")
Is this the way to do it? am i missing something here?
The cluster is up and running, and when i tried adding my application as a step jar, so it will run directly in the cluster it worked. But i want to be able to run it from a remote machine.
would appreciate some help here,
Thanks
To run from a remote machine, you will need to open the appropriate ports in the Security Group assigned to your EMR master node. You will need to add at least 7077.
If by "remote" you mean one that isn't in your AWS environment, you will also need to setup a way to route traffic to it from the outside.
Cassandra noob here. I've done the online training which didn't need more than a localhost connection. Now I've pulled out some old computers and set them up as a cluster, however I can't connect to them via DevCenter or using the Java Driver.
I used OpsCenter to set up the cluster hoping that I would not have to do any manual configuration, but it seems that some manual configuration will be required.
I used OpsCenter 4.0.3 to create a Community 2.0.3 cluster with four nodes. All four nodes are joined to the cluster. OpsCenter sees them all and shows them as Active. All four nodes are running Ubuntu Desktop 13.10. I have successfully added a keyspace using the OpsCenter Schema tab.
Nmap shows that none of the nodes has port 9042 open, so it seems to me that it's a problem with the client side agents not listening on the port.
At the suggestion of someone from DataStax I edited the cassandra.yaml file on one of the nodes (the seed node, as it happens) and set the rpc_address to the node ip address (ie: 192.168.0.123). I restarted the node from OpsCenter, but there was no effect.
I then edited cassandra.yaml and changed the listen_address to be the node address, and restarted the node from OpsCenter, again to no avail.
Clearly I have missed a step somewhere along the line. Anyone who has successfully started a Cassandra cluster know what I'm overlooking?
Edit cassandra.yaml, find the line that has rpc_address, un comment it and set it to:
rpc_address: 0.0.0.0
if you used datastax to install cassandra you can find cassandra.yaml in /etc/cassandra
Check that the following settings are on (at least one of ) your C* node:
start_native_transport: true
native_transport_port: 9042
rpc_address: IP -- where the IP is something you can ping from the machine running DevCenter.
Once you've restarted the node make sure you can actually connect to it: telnet IP 9042. If you cannot than most probably you haven't edited the right cassandra.yaml.