I want to implement the following flow from Java code:
Create a new AWS EMR instance (using AWS SDK)
Connect to the AWS EMR using the Hive JDBC (required IP)
Run my "SQL" queries on the EMR
Destroy the AWS EMR (using AWS SDK)
My problem is that when I create an EMR using the SDK I can only retrieve the AWS id of it, something like j-XXXXXXXXXXX. But in order to connect to the JDBC I need the master node IP. How can I obtain the master node IP from the code?
I'm following this JDBC example page
==UPDATE==
I tried using the AmazonElasticMapReduceClient.describeCluster but could only obtain the public DNS name while I'm looking for the private ip.
AFAIK there is no direct way to get it, but it can be achieved using 2 API calls and searching among them:
public String getMasterNodeIp(AmazonElasticMapReduceClient emr, String emrId) throws Exception {
Cluster cluster = emr.describeCluster(new DescribeClusterRequest().withClusterId(emrId)).getCluster();
ListInstancesResult instances = emr.listInstances(new ListInstancesRequest().withClusterId(emrId));
String masterDnsName = cluster.getMasterPublicDnsName();
for (Instance instance : instances.getInstances()) {
if (instance.getPublicDnsName().equals(masterDnsName)) {
return instance.getPrivateIpAddress();
}
}
throw new Exception("Failed to find master node private ip.");
}
Related
I tried Connecting the AWS Neptune with this Java code and got the error , NoHostAvailable Exception
approach 1:
public static void main(String[] args) throws Exception {
Cluster.Builder builder = Cluster.build();
builder.addContactPoint("endpoint");
builder.port(8182);
builder.enableSsl(true);
builder.keyStore("pem-file");
Cluster cluster = builder.create();
GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using(cluster));
System.out.println(g.V().limit(10).toList());
cluster.close();
}}
approach 2:
Cluster cluster = Cluster.build("endpoint").
enableSsl(true).keyStore("pem").
handshakeInterceptor( r -> {
NeptuneNettyHttpSigV4Signer sigV4Signer = null;
try {
sigV4Signer = new NeptuneNettyHttpSigV4Signer("us-east-2", new
DefaultAWSCredentialsProviderChain());
} catch (NeptuneSigV4SignerException e) {
e.printStackTrace();
}
try {
sigV4Signer.signRequest(r);
} catch (NeptuneSigV4SignerException e) {
e.printStackTrace();
}
return r;
}).create();
Client client=Cluster.open("src\\conf\\remote-objects.yaml").connect();
client.submit("g.V().limit(10).toList()").all().get();
what ever I do, I am getting this error:
Sep 02, 2021 3:18:34 PM io.netty.channel.ChannelInitializer exceptionCaught
WARNING: Failed to initialize a channel. Closing:
java.lang.RuntimeException: java.lang.NullPointerException
org.apache.tinkerpop.gremlin.driver.Channelizer$AbstractChannelizer.initChannel(Channelizer.java:117)
Caused by: org.apache.tinkerpop.gremlin.driver.exception.NoHostAvailableException: All hosts
are considered unavailable due to previous exceptions. Check the error log to find the actual
reason.
I need the code or the document to connect my Gremlin code in .java file to AWS neptune. I am struggling and tried various number of ways,
1.created EC2 instance and did installed maven and apache still got error and code is running in Server(EC2), i want code to present in IntelliJ
it would be more helpful, if I get the Exact Code any way. what should be added in remote-objects.yaml.
if we require Pem-file to access Amazon Neptune, please help with the creation of it.
Assuming SSL is enabled but IAM is not, in terms of Java code, this is all you need to create the connection.
Cluster.Builder builder = Cluster.build();
builder.addContactPoint("localhost");
builder.port(8182);
builder.enableSsl(true);
builder.serializer(Serializers.GRAPHBINARY_V1D0);
cluster = builder.create();
drc = DriverRemoteConnection.using(cluster);
g = traversal().withRemote(drc);
You may need to add an entry to your /etc/hosts file to get the SSL certs to resolve correctly such as:
127.0.0.1 localhost my-neptune-cluster.us-east-1.neptune.amazonaws.com
If you find that using localhost with SSL enabled does not work then use the actual Neptune cluster DNS name and make the edit to your /etc/hosts file.
The last thing you will need to do is create access to the Neptune VPC from your local machine. One way is using an SSH tunnel as explained in this post
I am running a Spring boot application that uses Spring data to connect to a remote instance of Couchbase on AWS.
My Couchbase configuration class looks like this:
#Value(value = "${couchbase-url}")
private String couchBaseUrl;
#Value("${couchbase-bucket}")
private String couchbaseBucketName;
#Value("${couchbase-password}")
private String couchbasePassword;
#Override
protected List<String> getBootstrapHosts() {
return Arrays.asList(couchBaseUrl);
}
#Override
protected String getBucketName() {
return couchbaseBucketName;
}
#Override
protected String getBucketPassword() {
return couchbasePassword;
}
where my param values look something like this:
couchbase-url=34.168.163.36:8091
couchbase-bucket=conversion-data
couchbase-password=secretpassword
When running this against a local instance, everything works as expected. When I run against the remote instance I get the following errors:
com.couchbase.client.deps.io.netty.channel.ConnectTimeoutException: connection timed out: /10.0.10.140:8093
Where 10.0.10.140 is the private IP address. So the initial connection seems to be fine but after that it has my service redirecting to the private IP address.
Can anyone explain how I can get Couchbase to respond with the public IP address?
While adding a server to a couchbase cluster it asks for the ip address or the hostname of the server, just enter the public ip of the server
The ip address which you have used to configure the server in the cluster is what couchbase returns to the client for any operation(like adding a key value)
References
https://developer.couchbase.com/documentation/server/5.0/install/init-setup.html
Couchbase Connection - External ip instead of internal
But as a general practice communicating via public ip is not recommended
If you are using AWS, you can deploy the couchbase cluster in a dedicated VPC and deploy you spring application in another VPC. Then you can use VPC Peering to establish an internal network connection between these two VPC's.
Once the above setup is up and running you can use private ip to communicate to the servers in the couchbase cluster
I am using Maria JDBC driver for creating a connection to Amazon Aurora DB
I wanted to create a secured connection so I read here
To connect to a DB cluster with SSL using the MySQL utility
Download the public key for the Amazon RDS signing certificate from
https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem.
Note that this will download a file named rds-combined-ca-bundle.pem.
First Question: How exactly is it secured - anyone can download this pem file from Amazon AWS?
I did some research How should i connect into Aurora DB with public key
and i found these 2 links First, Second
So my Code is quite Simple:
Class.forName("org.mariadb.jdbc.Driver");
Properties prop = new Properties();
prop.setProperty("javax.net.ssl.trustStore","C:\\temp\\rds-combined-ca-bundle.pem");
prop.setProperty("user",jdbcDetails.username);
prop.setProperty("password",jdbcDetails.getSensitiveData());
java.sql.Connection conne = DriverManager.getConnection(jdbcDetails.connectionString, prop);
try (Statement stmt1 = conne.createStatement()) {
// Execute all but the rest
ResultSet rs = stmt1.executeQuery("Select 98765 from dual limit 2");
while(rs.next()) {
rs.getLong(1);
}
}
conne.close();
Second Question: How is having the public key file relate to Encryption?
The above information doesn't get along with Oracle Java information that says:
If the client wants to authenticate the server, then the client's trust store must contain the server's certificate
Third Question: From what I understand if the client trust the server it doesn't require him to use this file
Forth Question: I was checking the connection creation with Wireshark
both cases with and without this public key file i was able to create a connection and both cases in Wireshark appeared Encrypted
Something that looks like that:
Encrypted Application Data:
eb:62:45:fb:10:50:f7:8c............:b9:0a:52:e7:97:1d:34
Base on this answer I understand about public key usage:
First,
It appears that Amazon AWS Azure documentation is misleading a bit - it is only relevant for connection with specific tool called MySQL utility
An answer for First & Second & third Question:
"Java can definitely establish an SSL connection without a client
validating the certificate chain of the server."
the key exchange is made to ensure that the server that it's connected to is indeed the one it was expecting (i.e non-suspicious server)
This means that it's still the same SSL connection made, but with verifyServerCertificate=false it does not verify that it is the intended server
Answer Forth Question:
Currect, The code is in Java - and passing the SSL parameter make it encrypted.
So using these parameter gives what requires
?trustServerCertificate=true&useSSL=true&requireSSL=true&verifyServerCertificate=false
In my current setup, I'm using the default multicast option of the Hazelcast cluster manager. When I link the instances of my containerized Vertx modules (via Docker networking links), I can see that they are successfully creating Hazelcast cluster. However, when I try publishing events on the event bus from one module, the other module doesn't react to it. I'm not sure how the network settings in the Hazelcast cluster related to the network settings for the event bus.
At the moment, I have the following programmatic configuration for each of my Vert.x module, each deployed inside a docker container.
ClusterManager clusterManager = new HazelcastClusterManager();
VertxOptions vertxOptions = new VertxOptions()
.setClustered(true)
.setClusterManager(clusterManager);
vertxOptions.setEventBusOptions(new EventBusOptions()
.setClustered(true)
.setClusterPublicHost("application"));
The Vert.x Core manual states that I may have to configure clusterPublicHost, and clusterPublicPort for the event bus, but I'm not sure how those relate to the general network topology.
One answer is here https://groups.google.com/d/msg/vertx/_2MzDDowMBM/nFoI_k6GAgAJ
I see this question come up a lot, and what a lot of people miss in
the documentation (myself included) is that Event Bus does not use the
cluster manager to send event bus messages. I.e. in your example with
Hazelcast as the cluster manager, you have the Hazelcast cluster up
and communicating properly (so your Cluster Manager is fine); however,
the Event bus is failing to communicate with your other docker
instances due to one or more of the following:
It is attempting to use an incorrect IP address to the other node (i.e. the IP of the private interface on the Docker instance, not the
publicly mapped one)
It is attempting to communicate on a port Docker is not configured to forward (the event bus picks a dynamic port if you don't specify
one)
What you need to do is:
Tell Vertx the IP address that the other nodes should use to talk to each instance ( using the -cluster-host [command line] ,
setClusterPublicHost [VertXOptions] or "vertx.cluster.public.host"
[System Property] options)
Tell Vertx explicitly the Port to use for event bus communication and ensure Docker is forwarding traffic for those ports ( using the
"vertx.cluster.public.port" [System Property], setClusterPublicPort
[VertXOptions] or -cluster-port [command line] options). In the past,
I have used 15701 because it is easy to remember (just a '1' in fromt
of the Hazelcast ports).
The Event bus only uses the Cluster Manager to manage the IP/Port
information of the other Vertx Instances and the registration of the
Consumers/Producers. The communications are done independently of the
cluster manager, which is why you can have the cluster manager
configured properly and communicating, but still have no Event bus
communications.
You may not need to do both the steps above if both your containers
are running on the same host, but you definitely will once you start
running them on separate hosts.
Something what also can happen, is that vert.x uses the loopback interface, when not specifying the IP which vert.x (not hazelcast) should take to communicate over eventbus. The problem here is, that you don't know which interface is taken to communicate over (loopback, interface with IP, you could even have multiple interfaces with IP).
To overcome this problem, I wrote a method once https://github.com/swisspush/vertx-cluster-watchdog/blob/master/src/main/java/org/swisspush/vertx/cluster/ClusterWatchdogRunner.java#L101
The cluster manager works fine, the cluster manager configuration has to be the same on each node (machine/docker container) in your cluster or don't make any configurations at all (use the default configuration of your cluster manager).
You have to make the event bus configuration be consistent on each node, you have to set the cluster host on each node to be the IP address of this node itself and any arbitrary port number (unless you try to run more than Vert.x instance on the same node you have to choose a different port number for each Vert.x instance).
For example if a node's IP address is 192.168.1.12 then you would do the following:
VertxOptions options = new VertxOptions()
.setClustered(true)
.setClusterHost("192.168.1.12") // node ip
.setClusterPort(17001) // any arbitrary port but make sure no other Vert.x instances using same port on the same node
.setClusterManager(clusterManager);
on another node whose IP address is 192.168.1.56 then you would do the following:
VertxOptions options = new VertxOptions()
.setClustered(true)
.setClusterHost("192.168.1.56") // other node ip
.setClusterPort(17001) // it is ok because this is a different node
.setClusterManager(clusterManager);
found this solution that worked perfectly for me, below is my code snippet (important part is the options.setClusterHost()
public class Runner {
public static void run(Class clazz) {
VertxOptions options = new VertxOptions();
try {
// for docker binding
String local = InetAddress.getLocalHost().getHostAddress();
options.setClusterHost(local);
} catch (UnknownHostException e) { }
options.setClustered(true);
Vertx.clusteredVertx(options, res -> {
if (res.succeeded()) {
res.result().deployVerticle(clazz.getName());
} else {
res.cause().printStackTrace();
}
});
}
}
public class Publisher extends AbstractVerticle {
public static void main(String[] args) {
Runner.run(Publisher.class);
}
...
}
no need to define anything else...
I am building a system of clustered computers with several nodes. there is a master node that is suppose to schedule task to several nodes in the cluster. the nodes are separate PCs that are connected to the master node via network cables. the whole system is expected to be implemented with java akka and play framework platform.
is there a way to implement this with akka remote clustering with play framework.
I am aware of the remote calculator tutorials but it seems to be runned with the SBT platform
but I will love to know if a similar tutorials exist with the play framework.
Or any link to help me with my project
thank you
An instance of Play! framework application can connect to a remote Akka node (i.e: your master node) using a simple configuration.
There are two ways:
override the default actor system
define a new actor system
I suggest you to use the second one.
In this case you have to add in application.conf something like
master {
akka {
actor {
provider = "akka.remote.RemoteActorRefProvider"
}
remote {
transport = "akka.remote.netty.NettyRemoteTransport"
netty {
hostname = "your-master-host-name"
port = 0
}
}
}
}
Then in your Play! app you can connect to te remote master node in this way
ActorSystem system = ActorSystem.create("master", ConfigFactory.load().getConfig("master"))
ActorRef master = system.actorFor("akka://master#your-master-host-name:your-master-port/user/master")
If you prefer to override de default Play Akka actor system. Here is the reference configuration: http://www.playframework.org/documentation/2.0.3/AkkaCore
For the master and computational cluster nodes I suggest you to use the architecture and the code described here: http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2
If your master and computational nodes does not required a web or REST interface you can implement them as simple Java program.
In the cited article the node are not exposed remotely. To do that just add an application.conf in master node app:
master {
akka {
actor {
provider = "akka.remote.RemoteActorRefProvider"
}
remote {
transport = "akka.remote.netty.NettyRemoteTransport"
netty {
hostname = "your-master-host-name"
port = your-master-port
}
}
}
}
And instantiate it in with actorOf method
ActorSystem system = ActorSystem.create("master", ConfigFactory.load().getConfig("master"))
ActorRef master = system.actorOf(new Props(Master.class), "master")
The computational nodes must be configured in the same way of Play! node.
Notice that only master node has a TCP-IP port defined. Non-master nodes use 0 port, which configure Akka to choose a random free port for them. This is correct because the only well-known host:port address you need is the master one, where every nodes, when its startup, has to point to.