So i tried to configure my job to be submitted to yarn, but instead it runs locally:
config.set("yarn.resourcemanager.address", "ADDRESS:8032");
config.set("mapreduce.framework.name", "yarn");
config.set("fs.default.name", "hdfs://ADDRESS:8020");
If i set mapred.job.tracker it fails with:
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException): Unknown rpc kind in rpc headerRPC_WRITABLE
because its not MR1.
So why is the app not submitted to yarn?
Solved it by doing this:
config.set("yarn.resourcemanager.address", "ADDRESS:8032");
config.set("yarn.resourcemanager.scheduler.address", "ADDRESS:8030");
config.set("yarn.resourcemanager.resource-tracker.address", "ADDRESS:8031");
config.set("yarn.resourcemanager.admin.address", "ADDRESS:8033");
instead of:
config.set("yarn.resourcemanager.address", "ADDRESS:8032");
Related
I am running a topology to read from 100 kafka topics using 100 storm spouts and then 50 bolts emitting the data to my own data processing class, consisting of 10 instances/threads for each bolt. I am getting the following errors now and then(not always) only on some of the supervisor hosts(not always the same). I clearly have com.yammer.metrics.Metrics in my classpath.
However the error keeps bugging me:
java.lang.NoClassDefFoundError: Could not initialize class com.yammer.metrics.Metrics
at kafka.metrics.KafkaMetricsGroup$class.newTimer(KafkaMetricsGroup.scala:52)
at kafka.consumer.FetchRequestAndResponseMetrics.newTimer(FetchRequestAndResponseStats.scala:25)
at kafka.consumer.FetchRequestAndResponseMetrics.<init>(FetchRequestAndResponseStats.scala:26)
at kafka.consumer.FetchRequestAndResponseStats.<init>(FetchRequestAndResponseStats.scala:37)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$$anonfun$2.apply(FetchRequestAndResponseStats.scala:50)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$$anonfun$2.apply(FetchRequestAndResponseStats.scala:50)
at kafka.utils.Pool.getAndMaybePut(Pool.scala:61)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.getFetchRequestAndResponseStats(FetchRequestAndResponseStats.scala:54)
at kafka.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:39)
at kafka.javaapi.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:34)
at storm.kafka.DynamicPartitionConnections.register(DynamicPartitionConnections.java:43)
at storm.kafka.PartitionManager.<init>(PartitionManager.java:58)
at storm.kafka.ZkCoordinator.refresh(ZkCoordinator.java:78)
at storm.kafka.ZkCoordinator.getMyManagedPartitions(ZkCoordinator.java:45)
at storm.kafka.KafkaSpoutWrapper.nextTuple(KafkaSpoutWrapper.java:74)
at backtype.storm.daemon.executor$fn__8840$fn__8855$fn__8886.invoke(executor.clj:577)
at backtype.storm.util$async_loop$fn__551.invoke(util.clj:491)
at clojure.lang.AFn.run(AFn.java:22)
at java.base/java.lang.Thread.run(Thread.java:833)
Any one having any ideas about this would be greatly appreciated! Thanks!
The storm version I am using: 0.10.2
The kafka version I am using: kafka_2.10-0.8.2.1
The error is weird because it is not always happening and also not always happening for the same supervisor host, not for all the supervisor hosts. I dont know where to begin with
I am using temporal for running workflows. I have created a jar with my app. and running the below cmd from terminal java -jar build/libs/app-0.0.1-SNAPSHOT.jar
Getting the below error when trying to run the above cmd:-
Exception in thread "main" io.grpc.StatusRuntimeException: UNKNOWN
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:271)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:252)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:165)
at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.getSystemInfo(WorkflowServiceGrpc.java:4139)
at io.temporal.serviceclient.SystemInfoInterceptor.getServerCapabilitiesOrThrow(SystemInfoInterceptor.java:95)
at io.temporal.serviceclient.ChannelManager.lambda$getServerCapabilities$3(ChannelManager.java:330)
at io.temporal.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:60)
at io.temporal.serviceclient.ChannelManager.connect(ChannelManager.java:297)
at io.temporal.serviceclient.WorkflowServiceStubsImpl.connect(WorkflowServiceStubsImpl.java:161)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:577)
at io.temporal.internal.WorkflowThreadMarker.lambda$protectFromWorkflowThread$1(WorkflowThreadMarker.java:83)
at jdk.proxy1/jdk.proxy1.$Proxy0.connect(Unknown Source)
at io.temporal.worker.WorkerFactory.start(WorkerFactory.java:210)
at com.hok.furlenco.workflow.refundStatusSync.RefundStatusSyncSaga.createWorkFlow(RefundStatusSyncSaga.java:41)
at com.hok.furlenco.workflow.refundStatusSync.RefundStatusSyncSaga.main(RefundStatusSyncSaga.java:17)
Caused by: java.nio.channels.UnsupportedAddressTypeException
at java.base/sun.nio.ch.Net.checkAddress(Net.java:146)
at java.base/sun.nio.ch.Net.checkAddress(Net.java:157)
at java.base/sun.nio.ch.SocketChannelImpl.checkRemote(SocketChannelImpl.java:816)
at java.base/sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:839)
at io.grpc.netty.shaded.io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:91)
at io.grpc.netty.shaded.io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:88)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
at io.grpc.netty.shaded.io.netty.util.internal.SocketUtils.connect(SocketUtils.java:88)
at io.grpc.netty.shaded.io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:322)
at io.grpc.netty.shaded.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:248)
at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1342)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:548)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:533)
at io.grpc.netty.shaded.io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:54)
at io.grpc.netty.shaded.io.grpc.netty.WriteBufferingAndExceptionHandler.connect(WriteBufferingAndExceptionHandler.java:157)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:548)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.access$1000(AbstractChannelHandlerContext.java:61)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext$9.run(AbstractChannelHandlerContext.java:538)
at io.grpc.netty.shaded.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
at io.grpc.netty.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503)
at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:833)
The app works fine when trying to run it from IDE:-
The temporal server is running as a docker container in my local:-
**
RefundStatusSyncSaga.java
**
/ gRPC stubs wrapper that talks to the local docker instance of temporal service.
WorkflowServiceStubs service = WorkflowServiceStubs.newLocalServiceStubs();
// client that can be used to start and signal workflows
WorkflowClient client = WorkflowClient.newInstance(service);
// worker factory that can be used to create workers for specific task queues
WorkerFactory factory = WorkerFactory.newInstance(client);
// Worker that listens on a task queue and hosts both workflow and activity implementations.
Worker worker = factory.newWorker(TASK_QUEUE);
// Workflows are stateful. So you need a type to create instances.
worker.registerWorkflowImplementationTypes(RefundSyncWorkflowImpl.class);
// Activities are stateless and thread safe. So a shared instance is used.
RefundStatusActivities tripBookingActivities = new RefundStatusActivitiesImpl();
worker.registerActivitiesImplementations(tripBookingActivities);
// Start all workers created by this factory.
factory.start();
System.out.println("Worker started for task queue: " + TASK_QUEUE);
// now we can start running instances of our saga - its state will be persisted
WorkflowOptions options = WorkflowOptions.newBuilder().setTaskQueue(TASK_QUEUE)
.setWorkflowId("1")
.setWorkflowIdReusePolicy( WorkflowIdReusePolicy.WORKFLOW_ID_REUSE_POLICY_REJECT_DUPLICATE)
.setCronSchedule("* * * * *")
.build();
RefundSyncWorkflow refundSyncWorkflow = client.newWorkflowStub(RefundSyncWorkflow.class, options);
refundSyncWorkflow.syncRefundStatus();
The complete code can be seen here -> https://github.com/iftekharkhan09/temporal-sample
I also come across this and I dig into the jar debugging. I found that in this check public static InetSocketAddress checkAddress(SocketAddress sa), the SocketAddress will become /xxx:443(my original addr is xxx:443). Then the validation check failed... I still don't know how to solve it.
update: one solution could be found here https://community.temporal.io/t/unable-to-run-temporal-workflow-from-jar/6607
Flink 1.5.3, When I submit flink job to flink cluster (on yarn), it always throw AskTimeoutException. In flink configuration file, I have configed the parmater "akka.ask.timeout=1000s" , but the Exception is still like this below.
That means I have increased the timeout parameter, "akka.ask.timeout=1000s" , but it does not work.
org.apache.flink.runtime.rest.handler.RestHandlerException: Job submission failed.
at org.apache.flink.runtime.rest.handler.job.JobSubmitHandler.lambda$handleRequest$2(JobSubmitHandler.java:116)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1851759541]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
... 21 more
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1851759541]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
... 9 more
So is there any solution to avoid this issue?
The timeouts of the communication between the REST handlers and the Flink cluster is controlled by web.timeout. The timeout is specified in milliseconds and, thus, you would need to set it to web.timeout: 1000000 in your flink-conf.yaml if you want to wait 1000s.
Moreover, it would be good to check the cluster entrypoint logs why the job submission takes so long. Usually it should not take longer than 10 seconds.
My application is running for months and working very well. Then suddenly I get the following error:
com.hazelcast.core.HazelcastInstanceNotActiveException: Hazelcast instance is not active!
at com.hazelcast.spi.impl.ProxyServiceImpl$ProxyRegistry.<init>(ProxyServiceImpl.java:220)
at com.hazelcast.spi.impl.ProxyServiceImpl$ProxyRegistry.<init>(ProxyServiceImpl.java:207)
at com.hazelcast.spi.impl.ProxyServiceImpl$1.createNew(ProxyServiceImpl.java:69)
at com.hazelcast.spi.impl.ProxyServiceImpl$1.createNew(ProxyServiceImpl.java:67)
at com.hazelcast.util.ConcurrencyUtil.getOrPutIfAbsent(ConcurrencyUtil.java:47)
at com.hazelcast.spi.impl.ProxyServiceImpl.getDistributedObject(ProxyServiceImpl.java:101)
at com.hazelcast.instance.HazelcastInstanceImpl.getDistributedObject(HazelcastInstanceImpl.java:285)
at com.hazelcast.instance.HazelcastInstanceImpl.getLock(HazelcastInstanceImpl.java:183)
at com.hazelcast.instance.HazelcastInstanceProxy.getLock(HazelcastInstanceProxy.java:77)
at br.com.xyz.lock.hazelcast.HazelcastLockManager.lock(HazelcastLockManager.java:37)
at br.com.xyz.lock.hazelcast.LockManagerFacade.lock(LockManagerFacade.java:24)
at br.com.xyz.recebe.negocio.NProcessadorMensagemRecebida.processamentoLock(NProcessadorMensagemRecebida.java:85)
at br.com.xyz.recebe.negocio.NProcessadorMensagemRecebida.processaArquivo(NProcessadorMensagemRecebida.java:74)
at br.com.xyz.recebe.processador.ProcessadorBase.processaArquivo(ProcessadorBase.java:75)
at br.com.xyz.recebe.processador.ProcessadorXml.processaArquivo(ProcessadorXml.java:16)
at br.com.xyz.recebe.processador.ProcessadorFacade.processaArquivo(ProcessadorFacade.java:34)
at br.com.xyz.recebe.mail.pdes.ProcessadorPDESMeRecebida.processar(ProcessadorPDESMeRecebida.java:77)
at gov.sefaz.util.pdes.ProcessadorDiretorioEntradaSaidaDaemon.processar(ProcessadorDiretorioEntradaSaidaDaemon.java:575)
at gov.sefaz.util.pdes.ProcessadorDiretorioEntradaSaidaDaemon.varrerDiretorioUsingStrategy(ProcessadorDiretorioEntradaSaidaDaemon.java:526)
at gov.sefaz.util.pdes.ProcessadorDiretorioEntradaSaidaDaemon.run(ProcessadorDiretorioEntradaSaidaDaemon.java:458)
at java.lang.Thread.run(Unknown Source)
I found some issues on Google that says it shutdown because other errors. But in my case there isn't any.
It shutdown without reason.
Has anyone seen this before?
The problem occurs if a hazelcast member doesn't shutdown normally (terminate), When you stop HazelcastInstance which queue proxy is bound to; then any operation on that queue after instance stopped should throw HazelcastInstanceNotActiveException.
You just need to wait for a few minutes and rerun your job.
I have configured SpringBatch-Admin-Console for launching and viewing jobs we use in our application.
When I launch the job from our application's console the job completes, and I am able to view the status of the job in the Console as expected.
Property Value
ID 449
Job Name analyzeJob
Job Instance 449
Job Parameters time=03-02-2013 17\:58\:13.54
Start Date 2013-02-03
Start Time 17:58:16
Duration 00:00:09
Status COMPLETED
Exit Code COMPLETED
Step Executions Count 3
Step Executions [processHeaderStep,inDBScanStep,inMemoryScanStep]
But when I launch the job from SpringBatch Admin Console, the job gets completed as expected, but when I try to view the status of the job in the console, I get the following error message.
"HTTP Status 500 - Request processing failed; nested exception is com.thoughtworks.xstream.converters.ConversionException: id : id : id : id ---- Debugging information ---- message : id : id cause-exception : com.thoughtworks.xstream.mapper.CannotResolveClassException cause-message : id : id class : java.util.HashMap required-type : java.util.HashMap path : /map/entry[3]/masterdata.analyzer.metadata.Metadata/hubCodeTables/masterdata.analyzer.metadata.MHubCodeTable/codeValueMap/entry[7]/id line number : -1 -------------------------------
"
I thought the exception may occured because my application is able to use the class java.util.Hashmap from JAVA_HOME whereas the SpringBatch-Admin-War inside tomcat is not able to use the same.
So I even created a jar file with all the Java class files (including java.util.HashMap) and included inside the LIB folder of this SpringBatch-Admin-Console but I still got the same error.
I also made sure all the required LIB files in my application is present inside the SpringBatch-Admin's LIB folder also. Got the same error still.
The weird behaviour is I am able to launch the job and also I am able to view the job if I launch it from outside, but gets this error only when I launch from the SpringBatch-Admin-Console.
Can anyone please tell me why this error occurs?
Delete the batch-tables before restarting the job.
Since spring-batch stores all the execution of a job with same parameter in same table, if you don't delete those entries, you might get previous error carried on now, even if you have resolved the error.
So make sure you delete the batch-tables before executing (restarting) the job again.