How to configure Java client connecting to AWS EMR spark cluster - java

I'm trying to write a simple spark application, and when i run it locally it works with setting the master as
.master("local[2]")
But after configuring spark cluster on AWS (EMR) i can't connet to the master url:
.master("spark://<master url>:7077")
Is this the way to do it? am i missing something here?
The cluster is up and running, and when i tried adding my application as a step jar, so it will run directly in the cluster it worked. But i want to be able to run it from a remote machine.
would appreciate some help here,
Thanks

To run from a remote machine, you will need to open the appropriate ports in the Security Group assigned to your EMR master node. You will need to add at least 7077.
If by "remote" you mean one that isn't in your AWS environment, you will also need to setup a way to route traffic to it from the outside.

Related

Kafka + Spring locally broker may not be available. Windows 10

I'm having trouble configuring kafka and spring on Windows 10 machine.
I did according to the guide, which I found on YouTube https://www.youtube.com/watch?v=IncG0_XSSBg&t=538s.
I can't connect locally in any way.
The spring application is very simple and its task is only to connect to the standing server.
I have already spent a lot of time looking for a solution and nothing helps me.
I tried a lot. changed it to
server.properties
listenera na listeners = PLAINTEXT: //127.0.0.1: 9092.
I changed Java version to jre 8.241.
The spring application cannot connect to the broker.
Please help.
UPDATE
After typing, to start Kafka server:
bin/kafka-server-start.sh config/server.properties
I have got following error:
After you run zookeper, open another terminal, change directory to again where you run zookeper, and then run command bin/kafka-server-start.sh config/server.properties. This will start kafka server, and you will be able to reach 9092 port.
For details, you can see quick start doc.

How to connect multiple Java applications to same Ignite cluster?

I have three Java applications that will connect to the same Ignite node (running on a particular VM) to access the same cache store.
Is there a step-by-step procedure on how to run a node outside Java application (from command prompt, may be) and connect my Java apps to it?
Your Java applications should serve as client nodes in your cluster. More information about client/sever mode can be found in the documentation. Server node(s) could be started from command line, it's described here. Information about running with a custom configuration could be found there as well. You need to set up discovery in order to make the entire thing work. It should be done on every node (incl. client nodes). I'd recommend you to use static IP finder in the configuration.

Akka Clustering not working with Docker container in host node

We are trying to use Application level clustering using the Akka Clustering for our distributed application which runs within docker containers across multiple nodes. We plan to run the docker container in the "host" mode networking.
When the dockerized application comes up for the first time, the Akka Clustering does seem to work and we do not see any Gossip messages being exchanged between the cluster nodes. This gets resolved only when we remove the file "/var/lib/docker/network/files/local-kv.db” and restart the docker service. This is not an acceptable solution for the production deployment and so we are trying to do an RCA and provide a proper solution.
Any help here would be really appreciated.
Tried removing file "/var/lib/docker/network/files/local-kv.db” and restarting docker service worked. But this workaround is unacceptable in the production deployment
Tried using the bridge network mode for the dockerized container. That helps, but our current requirement requires us to run the container in "host" mode.
application.conf has the following settings for the host and port currently.
hostname = "" port = 2551 bind-hostname = "0.0.0.0" bind-port = 2551
No gossip messages are exchanged between the akka cluster nodes. Whereas we see those messages after applying the mentioned workaround

Apache Flink: Standalone Cluster tries to connect with username "flink"

For my master thesis I'm trying to set up a flink standalone cluster on 4 nodes. I've worked along the documentation which pretty neatly explains how to set it up. But when I start the cluster there is a warning and when I'm trying to run a job, there is an error with the same message:
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink#MYHOSTNAME:6123/user/jobmanager#-818199108]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.messages.JobManagerMessages$LeaderSessionMessage"
Increasing the timeout didn't work. When I open the taskmanagers in web UI, all of them have the following pattern:
akka.tcp://flink#MYHOSTNAME:33779/user/taskmanager
Does anyone have an idea how to solve this to get the cluster working? Thanks in advance!
One last thing: There isn't a user "flink" on the cluster and won't be created. So any advices without telling me I should create that user would be very appreciated! Thanks!
Not sure if it is still relevant, but the way i did it (using Flink 1.5.3):
I setup a HA standalone cluster with 3 master (JobManager) and 20 slaves (TaskManager) in the following way.
Define your conf/masters file (hostname:8081 per line)
Define your conf/slaves file (each taskmanager hostname per line)
Define in the flink-conf.yaml on each master machine its own jobmanager.rpc.address hostname
Define in the flink-conf.yaml on each slave machine the jobmanager.rpc.address as localhost
Once every is set, execute the bin/start-cluster.sh on any of the master host.
If you need HA then you need to setup a zookeeper quorum and modify the corresponding properties regarding HA (high-availability, high-availability.storageDir, high-availability.zookeeper.quorum)

Kubernetes service in java does not resolve restarted service/replicationcontroller

I have a kubernetes cluster where one service (java application) connects to another service to write data (elasticsearch).
When elasticsearch (service & replicationcontroller) is restarted/redeployed, the java-application looses it's connection, which can only be recovered by restarting the java-application (rc). This is not the desired behaviour and should be solved.
Using curl from the kubernetes pod of the application to query elasticsearch does work fine after restart, so it must be probably something java is doing.
It does work when only the replicationcontroller for elasticsearch is touched, leaving the service as it is. But why does curl work in that case, however this should not be the solution.
Using the same konfiguration in a local docker setup without kubernetes does also not lead to problems.
Promising solutions that did not worked:
Setting networkaddress.cache.ttlor networkaddress.cache.negative.ttl to zero (or other small positive values)
Hacking /etc/nsswitch.conf as described in https://stackoverflow.com/a/32550032/363281
I'm using kubernetes 1.1.3, OpenJDK 8u66, service Dockerfile is derived from java:8
Try java.security.Security.setProperty("networkaddress.cache.ttl" , "60");
This means sixty seconds and you should adapt to your needs.
One solution is not to restart your Service: a Service resolves the Pods by IPs and watches the Pods by selectors, so you don't need to restart the Service when you restart your Pods.
Now likely what is happening is that your application is resolving the Service at start up, and it then caches the IP. When you restart the Service it likely gets a new IP which messes up your application's behavior. You need to check how you can reset this cache or initiate some sort of restart of that App when the pods/services are changes.
If you don't restart the Service, the IP won't change, but it will still proxy to the Pods that are restarted.

Categories