So I am trying to connect to an ElasticSearch cluster, following these examples:
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/transport-client.html
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-index.html
This is my code:
def saveToES(message: String) {
println("start saving")
// val client = TransportClient.builder().settings(settings).build().addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("xxx.com"), 9200))
// prepare Adresses
val c = InetAddress.getByName("xxx.com")
println("inetadress done")
val d = new InetSocketTransportAddress(c, 9200)
println("socketadress done")
val settings = Settings.settingsBuilder().put("cluster.name", "CLUSTER").build();
println("settings created")
val a = TransportClient.builder()
println("Builder created")
val aa = a.settings(settings)
println("settings set")
val b = a.build
println("client built")
val client = b.addTransportAddress(d)
println("adress added to client")
val response = client.prepareIndex("performance_logs", "performance").setSource(message).get()
//
println(response.toString)
//
// // on shutdown
//
// client.close();
}
And the program get's stuck on building the client (val b = a.build), so my last print is ("settings set"). No error or whatever, just stuck. For firewall reasons I have to deploy the whole thing in a jar and execute on a server, so I can't really debug with the IDE.
Any idea what the problem is?
Related
Prompt, there is a task, to receive data from smart folders in the Teamcenter 11.6 environment.
I found such a soa - ProjectSmartFolder and get_project_data(), but I don't know how to use it correctly, what to submit as input, and how to correctly submit logging data to it.
here is the part on logging and getting project names:
connection = new Connection(ConnectSeting.Connection_http_adress.getValue().toString(),
credentialManager, ConnectSeting.Connection_REST.getValue().toString(),
ConnectSeting.Connection_HTTP_Protokol.getValue().toString());
connection.setExceptionHandler(new AppXExceptionHandler());
displayStringService = new TcDisplayStringRepositoryImpl(connection);
System.out.println("SmartFoldersTest try ");
SessionService sessionService = SessionService.getService(connection);
String[] credentials = credentialManager.getCredentials(new InvalidUserException());
LoginResponse resp = sessionService.login(credentials[CredentialManagerImpl.CRED_USER],
credentials[CredentialManagerImpl.CRED_PASSWORD], credentials[CredentialManagerImpl.CRED_GROUP],
credentials[CredentialManagerImpl.CRED_ROLE], "ru_RU",
credentials[CredentialManagerImpl.CRED_DISCRIMINATOR]);
resp.serviceData.sizeOfPartialErrors();
SavedQueryService sqs = SavedQueryService.getService(connection);
System.out.println("Connecting sqs -> " + sqs);
ProjectLevelSecurityService plss = ProjectLevelSecurityService.getService(connection);
UserProjectsInfoInput[] upiis = new UserProjectsInfoInput[1];
UserProjectsInfoInput upii = new UserProjectsInfoInput();
upii.activeProjectsOnly = false;
upii.privilegedProjectsOnly = false;
upii.programsOnly = false;
upii.user = resp.user;
upiis[0] = upii;
UserProjectsInfoResponse upir = plss.getUserProjects(upiis);
System.out.println("Projects: ");
System.out.println(upir.userProjectInfos[0].projectsInfo[0].project.get_object_name());
// ProjectSmartFolder projSmartFolder = new ProjectSmartFolder(null, null); ?????????
I'm creating clusters on AWS EMR (with Console and SDK). But these clusters always remain "starting" state and never start. Why can this happen and how can I solve it? Thanks.
bootstrap-actions log:
INFO i-062fab1a95f485684: new instance started
ERROR i-062fab1a95f485684: failed to start. bootstrap action 1 failed with non-zero exit code.
My codes:
val emr = AmazonElasticMapReduceClientBuilder.standard()
.withCredentials(new AWSStaticCredentialsProvider(awsCred))
.withRegion(Regions.EU_WEST_1)
.build()
val stepFactory = new StepFactory();
val enabledebugging = new StepConfig()
.withName("Enable debugging")
.withActionOnFailure("TERMINATE_JOB_FLOW")
.withHadoopJarStep(stepFactory.newEnableDebuggingStep())
val spark = new Application().withName("Spark")
val hive = new Application().withName("Hive")
val ganglia = new Application().withName("Ganglia")
val zeppelin = new Application().withName("Zeppelin")
val request = new RunJobFlowRequest()
.withName("Spark Cluster")
.withReleaseLabel("emr-5.20.0")
.withSteps(enabledebugging)
.withApplications(spark)
.withLogUri("s3://my-logs")
.withServiceRole("EMR_DefaultRole")
.withJobFlowRole("EMR_EC2_DefaultRole")
.withInstances(new JobFlowInstancesConfig()
.withEc2SubnetId("subnet-xxxxx")
.withEc2KeyName("ec2test")
.withInstanceCount(3)
.withKeepJobFlowAliveWhenNoSteps(true)
.withMasterInstanceType("m5.xlarge")
.withSlaveInstanceType("m5.xlarge")
);
val result = emr.runJobFlow(request);
System.out.println("The cluster ID is " + result.toString());
Context
I run spark applications on an Amazon EMR cluster.
These applications are orchestrated by Yarn.
From AWS Console, I am able to get YARN application status using the Application History tab of the cluster's detail page. (cf. View Application History)
Expectation / Question
I would like to get the same information (application status) but from a java or scala program.
So, is it possible to get yarn application status from AWS EMR Java SDK ?
In my application, I manage some EMR object instance like:
AmazonElasticMapReduceClient
Cluster
Thanks in advance.
I came upon this because i was looking for a way to get the job status via EMRs "steps" api...but if you're looking to get it via yarn directly here is some sample code:
object DataLoad {
private def getJsonField(json: JValue, key: String): Option[String] = {
val value = (json \ key)
value match {
case jval: JValue => Some(jval.values.toString)
case _ => None
}
}
def load(logger: Logger, hiveDatabase: String, hiveTable: String, dw_table_name: String): Unit = {
val conf = ConfigFactory.load
val yarnResourceManager = conf.getString("app.yarnResourceManager")
val sparkExecutors = conf.getString("app.sparkExecutors")
val sparkHome = conf.getString("app.sparkHome")
val sparkAppJar = conf.getString("app.sparkAppJar")
val sparkMainClass = conf.getString("app.sparkMainClass")
val sparkMaster = conf.getString("app.sparkMaster")
val sparkDriverMemory = conf.getString("app.sparkDriverMemory")
val sparkExecutorMemory = conf.getString("app.sparkExecutorMemory")
var destination = ""
if(dw_table_name.contains("s3a://")){
destination = "s3"
}
else
{
destination = "sql"
}
val spark = new SparkLauncher()
.setSparkHome(sparkHome)
.setAppResource(sparkAppJar)
.setMainClass(sparkMainClass)
.setMaster(sparkMaster)
.addAppArgs(hiveDatabase)
.addAppArgs(hiveTable)
.addAppArgs(destination)
.setVerbose(false)
.setConf("spark.driver.memory", sparkDriverMemory)
.setConf("spark.executor.memory", sparkExecutorMemory)
.setConf("spark.executor.cores", sparkExecutors)
.setConf("spark.executor.instances", sparkExecutors)
.setConf("spark.driver.maxResultSize", "5g")
.setConf("spark.sql.broadcastTimeout", "144000")
.setConf("spark.network.timeout", "144000")
.startApplication()
var unknownCounter = 0
while(!spark.getState.isFinal) {
println(spark.getState.toString)
Thread.sleep(10000)
if(unknownCounter > 3000){
throw new IllegalStateException("Spark Job Failed, timeout expired 8 hours")
}
unknownCounter += 1
}
println(spark.getState.toString)
val appId: String = spark.getAppId
println(s"appId: $appId")
var finalState = ""
var i = 0
while(i < 5){
val response = Http(s"http://$yarnResourceManager/ws/v1/cluster/apps/$appId/").asString
if(response.code.toString.startsWith("2"))
{
val json = parse(response.body)
finalState = getJsonField(json \ "app","finalStatus").getOrElse("")
i = 55
}
else {
i = i+1
}
}
if(finalState.equalsIgnoreCase("SUCCEEDED")){
println("SPARK JOB SUCCEEDED")
}
else {
throw new IllegalStateException("Spark Job Failed")
}
}
}
My application requires that I have multiple threads running fetching data from various HDFS nodes. For that I am using the thread executor pool and forking threads.
Forking at :
val pathSuffixList = fileStatuses.getOrElse("FileStatus", List[Any]()).asInstanceOf[List[Map[String, Any]]]
pathSuffixList.foreach(block => {
ConsumptionExecutor.execute(new Consumption(webHdfsUri,block))
})
My class Consumption :
class Consumption(webHdfsUri: String, block:Map[String,Any]) extends Runnable {
override def run(): Unit = {
val uriSplit = webHdfsUri.split("\\?")
val fileOpenUri = uriSplit(0) + "/" + block.getOrElse("pathSuffix", "").toString + "?op=OPEN"
val inputStream = new URL(fileOpenUri).openStream()
val datumReader = new GenericDatumReader[Void]()
val dataStreamReader = new DataFileStream(inputStream, datumReader)
// val schema = dataStreamReader.getSchema()
val dataIterator = dataStreamReader.iterator()
while (dataIterator.hasNext) {
println(" data : " + dataStreamReader.next())
}
}
}
ConsumptionExecutor :
object ConsumptionExecutor{
val counter: AtomicLong = new AtomicLong()
val executionContext: ExecutorService = Executors.newCachedThreadPool(new ThreadFactory {
def newThread(r: Runnable): Thread = {
val thread: Thread = new Thread(r)
thread.setName("ConsumptionExecutor-" + counter.incrementAndGet())
thread
}
})
executionContext.asInstanceOf[ThreadPoolExecutor].setMaximumPoolSize(200)
def execute(trigger: Runnable) {
executionContext.execute(trigger)
}
}
However I want to use Akka streaming/ Akka actors where in I don't need to give a fixed thread pool size and Akka takes care of everything.
I am pretty new to Akka and the concept of Streaming and actors . Can someone give me any leads in the form of a sample code to fit my use case?
Thanks in advance!
An idea would be to create a (subclass) instance of ActorPublisher for each HDFS node that you are reading from, and then Merge them in as multiple Sources in a FlowGraph.
Something like this pseudo-code, where the details of the ActorPublisher sources are left out:
val g = PartialFlowGraph { implicit b =>
import FlowGraphImplicits._
val in1 = actorSource1
val in2 = actorSource2
// etc.
val out = UndefinedSink[T]
val merge = Merge[T]
in1 ~> merge ~> out
in2 ~> merge
// etc.
}
This can be improved for a collection of actor sources by just iterating over them and adding an edge to the merge for each one, but this gives the idea.
I've been trying to create an application which needs to scan open ports on a network (mostly LAN) as fast as possible.
I searched around and one great method that I found uses the following code:
(1 to 65536).par.map { case port ⇒
try {
val socket = new java.net.Socket("127.0.0.1", port)
socket.close()
println(port)
port
} catch {
case _: Throwable ⇒ -1
}
}.toSet
However, the problem with the code is that if I enter anything other than 127.0.0.1 or localhost as location (say 192.168.1.2), the application freezes.
Any idea why this happens and how I can fix it?
P.S. I also tried setting socket timeout with socket.setSoTimeout(1500), but no change.
Something like
import scala.concurrent.{Future, Await}
import scala.concurrent.duration._
import scala.util.Try
import scala.concurrent._
import java.util.concurrent.Executors
implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(100))
def openPorts(address:String ="127.0.0.1",duration:Duration = 10 seconds, fromPort:Int = 1, toPort:Int = 65536) = {
val socketTimeout = 200
val result = Future.traverse(fromPort to toPort ) { port =>
Future{ Try {
val socket = new java.net.Socket()
socket.connect(new java.net.InetSocketAddress(address, port),socketTimeout)
socket.close()
port
} toOption }
}
Try {Await.result(result, duration)}.toOption.getOrElse(Nil).flatten
}
scala> val localPorts openPorts(fromPort = 10, toPort = 1000)
localPorts: scala.collection.immutable.IndexedSeq[Int] = Vector(22, 631)
scala> val remotePorts = openPorts(fromPort = 10, toPort = 1000, address="192.168.1.20")
remotePorts: scala.collection.immutable.Seq[Int] = List() //we ate the timeout
scala> val remotePorts = openPorts(fromPort = 12000, toPort = 13000, address="91.190.218.61", duration=30 seconds)
remotePorts: scala.collection.immutable.Seq[Int] = Vector(12345, 12350)
Although Ashalynd's answer works well too, while I was playing with this application again I found a solution, perhaps simpler, using Futures.
Here is the code
import scala.concurrent._
import ExecutionContext.Implicits.global
object TestOne extends App{
println("Program is now running")
val dataset: Future[Set[Int]] = future {
(1 to 6335).map {
case port =>
try {
val socket = new java.net.Socket("127.0.0.1", port)
socket.close()
println("The port is open at " + port)
port
} catch {
case _: Throwable => -1
}
}.toSet
}
}