Apache Spark stopping JVM when master not available

Apache Spark stopping JVM when master not available - java

In my application Java spark context is created with an unavailable master URL (you may assume master is down for a maintenance). When creating Java spark context it leads to stopping JVM that runs spark driver with JVM exit code 50.
When I checked the logs I found SparkUncaughtExceptionHandler calling the System.exit. My program should run forever. How should I overcome this issue ?
I tried this scenario in spark version 1.4.1 and 1.6.0
My code is given below
package test.mains;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
public class CheckJavaSparkContext {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
SparkConf conf = new SparkConf();
conf.setAppName("test");
conf.setMaster("spark://sunshine:7077");
try {
new JavaSparkContext(conf);
} catch (Throwable e) {
System.out.println("Caught an exception : " + e.getMessage());
//e.printStackTrace();
}
System.out.println("Waiting to complete...");
while (true) {
}
}
}
Part of the output log
16/03/04 18:02:24 INFO SparkDeploySchedulerBackend: Shutting down all executors
16/03/04 18:02:24 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
16/03/04 18:02:24 WARN AppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master
16/03/04 18:02:24 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main]
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.deploy.client.AppClient.stop(AppClient.scala:290)
at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.org$apache$spark$scheduler$cluster$SparkDeploySchedulerBackend$$stop(SparkDeploySchedulerBackend.scala:198)
at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.stop(SparkDeploySchedulerBackend.scala:101)
at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:446)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1582)
at org.apache.spark.SparkContext$$anonfun$stop$7.apply$mcV$sp(SparkContext.scala:1731)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1229)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1730)
at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:127)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:134)
at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1163)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:129)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/03/04 18:02:24 INFO DiskBlockManager: Shutdown hook called
16/03/04 18:02:24 INFO ShutdownHookManager: Shutdown hook called
16/03/04 18:02:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea68a0fa-4f0d-4dbb-8407-cce90ef78a52
16/03/04 18:02:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea68a0fa-4f0d-4dbb-8407-cce90ef78a52/userFiles-db548748-a55c-4406-adcb-c09e63b118bd
Java Result: 50

If application master is down application by itself will try to connect to the master three times with 20 second timeout. It looks like these parameters are hardcoded and not configurable. If application fails to connect there is nothing more you can do than to try to resubmit your application once it is up again.
That is why you should configure your cluster in a high availability mode. Spark Standalone supports two different modes:
Single-Node Recovery with Local File System
Standby Masters with ZooKeeper
where the second option should be applicable in production and useful in the described scenario.

Related

Mongo Connection issue with error "state should be: open"

I am running an event in Akka actor system, where we run multiple actors to query mongo db and retrieve data. Each actor queries for 1000 documents (each document's size is 9kb)
When running an event that is required to fire 14 actors to query for Mongo DB to retrieve 13000 documents.Once I experienced below exception, not sure why? Have anyone experienced this before?
2020-04-14 19:17:28,818 [erp-writer-actor-system-akka.actor.default-dispatcher-378] ERROR c.a.s.c.m.GlobalContextMongoClientService- 76cd7a80-83ef-4389-885a-be9caed77449 - Exception occured while reading data from cursor
java.lang.IllegalStateException: state should be: open
at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70)
at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:84)
at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86)
at com.mongodb.operation.QueryBatchCursor.getMore(QueryBatchCursor.java:203)
at com.mongodb.operation.QueryBatchCursor.hasNext(QueryBatchCursor.java:103)
at com.mongodb.MongoBatchCursorAdapter.hasNext(MongoBatchCursorAdapter.java:46)
at com.xyz.smartconnect.commons.mongoclient.GlobalContextMongoClientService.findWorkers(GlobalContextMongoClientService.java:145)
at com.xyz.smartconnect.actors.QueryWorkersActor.lambda$createReceive$0(QueryWorkersActor.java:40)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at akka.actor.Actor$class.aroundReceive(Actor.scala:513)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:132)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:519)
at akka.actor.ActorCell.invoke(ActorCell.scala:488)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Suppressed: java.lang. IllegalStateException: state should be: open
at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70)
at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:84)
at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86)
at com.mongodb.operation.QueryBatchCursor.killCursor(QueryBatchCursor.java:261)
at com.mongodb.operation.QueryBatchCursor.close(QueryBatchCursor.java:147)
at com.mongodb.MongoBatchCursorAdapter.close(MongoBatchCursorAdapter.java:41)
at com.xyz.smartconnect.commons.mongoclient.GlobalContextMongoClientService.findWorkers(GlobalContextMongoClientService.java:149)

After running multiple tests and analyzing the logs carefully, I found the root cause. Below are the details.
While the application is using cursor to query data from mongoDb, connection has been released/closed. 'State should be : open' is complaining about a released connection.
In my case, my application experienced OutOfMemory, which caused disposing beans and releasing connections. Here is timeline of log events for this issue.
Since this is a memory issue for my case, fixing memory issue will fix below exception for me.
2020-04-19 12:57:32,981 [xyz-actor-system-akka.actor.default-dispatcher-72] ERROR a.a.ActorSystemImpl- - 413f9298-ca92-4744-913b-59934e4ce831 - exception on LARS’ timer thread
java.lang.OutOfMemoryError: GC overhead limit exceeded
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:269)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
at java.lang.Thread.run(Thread.java:748)
2020-04-19 12:57:43,649 [Thread-19] INFO o.s.c.s.DefaultLifecycleProcessor- - - Stopping beans in phase 2147483647
2020-04-19 12:58:13,483 [Thread-19] INFO o.s.j.e.a.AnnotationMBeanExporter- - - Unregistering JMX-exposed beans on shutdown
2020-04-19 12:58:45,186 [localhost-startStop-2] INFO c.a.s.ApplicationContextListener- - - >>>>>>>>> Disposing beans
2020-04-19 12:59:00,182 [localhost-startStop-2] INFO c.a.s.c.SpringBeanDisposer- - - Mongo connections are released.
2020-04-19 12:59:09,591 [xyz-actor-system-akka.actor.default-dispatcher-73] ERROR c.a.s.c.m.GlobalContextMongoClientService- - 413f9298-ca92-4744-913b-59934e4ce831 - Exception occured while reading data from cursor
java.lang.IllegalStateException: state should be: open
at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70)
at com.mongodb.connection.DefaultServer.getDescription(DefaultServer.java:114)
at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getServerDescription(ClusterBinding.java:81)
at com.mongodb.operation.QueryBatchCursor.initFromCommandResult(QueryBatchCursor.java:251)
at com.mongodb.operation.QueryBatchCursor.getMore(QueryBatchCursor.java:207)
at com.mongodb.operation.QueryBatchCursor.hasNext(QueryBatchCursor.java:103)
at com.mongodb.MongoBatchCursorAdapter.hasNext(MongoBatchCursorAdapter.java:46)

Wildfly 10 to Wildfly 16 : EJB Scheduler stuck, Not able to undeploy/disable deployments and need to kill server

We have General Filemover Service which is scheduled and file moves from one to other location.
We are migrating from Wildfly 10 to Wildfly 16 and facing this issue in Wildfly 16.
In Wildfly 16, It is giving strange behavior i.e when timer stuck and we disable or undeploy the deployable then servergroup stuck and we need to kill only and restart:
When Timer stuck following warnings come continuously:
2019-12-13 14:00:00,000 WARN [org.jboss.as.ejb3.timer] (EJB default - 10) WFLYEJB0043:
A previous execution of timer [id=51e7977a-722a-4b20-9db1-f3534b2e3cff
timedObjectId=filemover-1.5-SNAPSHOT-wildfly10.filemover-1.5-SNAPSHOT-wildfly10.FileMover
auto-timer?:true
persistent?:false
timerService=org.jboss.as.ejb3.timerservice.TimerServiceImpl#2ac49ed5
initialExpiration=null
intervalDuration(in milli sec)=0
nextExpiration=Fri Dec 13 14:00:00 CET 2019
timerState=IN_TIMEOUT
info=null]
is still in progress,
skipping this overlapping scheduled execution at: Fri Dec 13 14:00:00 CET 2019.
Then i click on disable or undeploy in Wildfly UI and the process stuck for indefinite time.
Error: There is or more management operations running longer than expected, it may negatively impact the performance of the server. Check the Management Operations view to display the active operations.
After Undeployment or Disable logs show following message:
2019-12-13 14:05:13,225 INFO [org.jboss.modcluster] (ServerService Thread Pool -- 15) MODCLUSTER000021: All pending requests drained from default-host:/filemover-1.5-SNAPSHOT-wildfly10 in 0.0 seconds
2019-12-13 14:05:13,227 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 15) WFLYUT0022: Unregistered web context: '/filemover-1.5-SNAPSHOT-wildfly10' from server 'default-server'
2019-12-13 14:06:00,003 ERROR [org.jboss.as.ejb3.timer] (EJB default - 4) WFLYEJB0020: Error invoking timeout for timer: [id=4416f1bb-1d5a-4992-bfa5-b7d635136f4e timedObjectId=filemover-1.5-SNAPSHOT-wildfly10.filemover-1.5-SNAPSHOT-wildfly10.FileMover auto-timer?:true persistent?:false timerService=org.jboss.as.ejb3.timerservice.TimerServiceImpl#2ac49ed5 initialExpiration=null intervalDuration(in milli sec)=0 nextExpiration=Fri Dec 13 14:08:00 CET 2019 timerState=IN_TIMEOUT info=null]: org.jboss.as.ejb3.component.EJBComponentUnavailableException: WFLYEJB0421: Invocation cannot proceed as component is shutting down
at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:59)
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
at org.jboss.invocation.ContextClassLoaderInterceptor.processInvocation(ContextClassLoaderInterceptor.java:60)
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
at org.jboss.invocation.InterceptorContext.run(InterceptorContext.java:438)
at org.wildfly.security.manager.WildFlySecurityManager.doChecked(WildFlySecurityManager.java:618)
at org.jboss.invocation.AccessCheckingInterceptor.processInvocation(AccessCheckingInterceptor.java:57)
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:53)
at org.jboss.as.ejb3.timerservice.TimedObjectInvokerImpl.callTimeout(TimedObjectInvokerImpl.java:99)
at org.jboss.as.ejb3.timerservice.CalendarTimerTask.invokeBeanMethod(CalendarTimerTask.java:64)
at org.jboss.as.ejb3.timerservice.CalendarTimerTask.callTimeout(CalendarTimerTask.java:53)
at org.jboss.as.ejb3.timerservice.TimerTask.run(TimerTask.java:181)
at org.jboss.as.ejb3.timerservice.TimerServiceImpl$Task$1.run(TimerServiceImpl.java:1302)
at org.wildfly.extension.requestcontroller.RequestController$QueuedTask$1.run(RequestController.java:494)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at org.jboss.threads.JBossThread.run(JBossThread.java:485)
After few seconds to a few minutes following error appears:
2019-12-13 14:10:13,218 ERROR [org.jboss.as.controller.management-operation] (ServerService Thread Pool -- 90) WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'undeploy' at address '[("deployment" => "filemover-1.5-SNAPSHOT-wildfly10.war")]'
2019-12-13 14:10:18,218 INFO [org.jboss.as.protocol] (ServerService Thread Pool -- 93) WFLYPRT0057: cancelled task by interrupting thread Thread[ServerService Thread Pool -- 90,5,ServerService ThreadGroup]
2019-12-13 14:10:23,219 ERROR [org.jboss.as.controller.management-operation] (ServerService Thread Pool -- 90) WFLYCTL0190: Step handler org.jboss.as.server.deployment.DeploymentHandlerUtil$5#13139499 for operation undeploy at address [("deployment" => "filemover-1.5-SNAPSHOT-wildfly10.war")] failed handling operation rollback -- java.lang.IllegalStateException: WFLYCTL0345: Timeout after 5 seconds waiting for existing service service jboss.deployment.unit."filemover-1.5-SNAPSHOT-wildfly10.war".contents to be removed so a new instance can be installed.: java.lang.IllegalStateException: WFLYCTL0345: Timeout after 5 seconds waiting for existing service service jboss.deployment.unit."filemover-1.5-SNAPSHOT-wildfly10.war".contents to be removed so a new instance can be installed.
at org.jboss.as.controller.OperationContextImpl.installService(OperationContextImpl.java:2033)
at org.jboss.as.controller.OperationContextImpl.access$600(OperationContextImpl.java:133)
at org.jboss.as.controller.OperationContextImpl$2$1.installService(OperationContextImpl.java:762)
at org.jboss.as.controller.OperationContextImpl$ContextServiceBuilder.install(OperationContextImpl.java:2171)
at org.jboss.msc.service.DelegatingServiceBuilder.install(DelegatingServiceBuilder.java:104)
at org.jboss.as.server.deployment.ContentServitor.addService(ContentServitor.java:48)
at org.jboss.as.server.deployment.DeploymentHandlerUtil.doDeploy(DeploymentHandlerUtil.java:196)
at org.jboss.as.server.deployment.DeploymentHandlerUtil$5$1.handleResult(DeploymentHandlerUtil.java:388)
at org.jboss.as.controller.AbstractOperationContext$Step.invokeResultHandler(AbstractOperationContext.java:1533)
at org.jboss.as.controller.AbstractOperationContext$Step.handleResult(AbstractOperationContext.java:1515)
at org.jboss.as.controller.AbstractOperationContext$Step.finalizeInternal(AbstractOperationContext.java:1472)
at org.jboss.as.controller.AbstractOperationContext$Step.finalizeStep(AbstractOperationContext.java:1445)
at org.jboss.as.controller.AbstractOperationContext$Step.access$400(AbstractOperationContext.java:1319)
at org.jboss.as.controller.AbstractOperationContext.executeResultHandlerPhase(AbstractOperationContext.java:876)
at org.jboss.as.controller.AbstractOperationContext.processStages(AbstractOperationContext.java:726)
at org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:467)
at org.jboss.as.controller.OperationContextImpl.executeOperation(OperationContextImpl.java:1412)
at org.jboss.as.controller.ModelControllerImpl.internalExecute(ModelControllerImpl.java:423)
at org.jboss.as.controller.ModelControllerImpl.lambda$execute$1(ModelControllerImpl.java:243)
at org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:289)
at org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:255)
at org.jboss.as.controller.ModelControllerImpl.execute(ModelControllerImpl.java:243)
at org.jboss.as.controller.remote.TransactionalProtocolOperationHandler.internalExecute(TransactionalProtocolOperationHandler.java:269)
at org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ExecuteRequestHandler.doExecute(TransactionalProtocolOperationHandler.java:201)
at org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ExecuteRequestHandler$1.run(TransactionalProtocolOperationHandler.java:148)
at org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ExecuteRequestHandler$1.run(TransactionalProtocolOperationHandler.java:144)
at org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:289)
at org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:255)
at org.jboss.as.controller.AccessAuditContext.doAs(AccessAuditContext.java:198)
at org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ExecuteRequestHandler$2$1.get(TransactionalProtocolOperationHandler.java:172)
at org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ExecuteRequestHandler$2$1.get(TransactionalProtocolOperationHandler.java:163)
at org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$Execution$1.execute(TransactionalProtocolOperationHandler.java:677)
at org.jboss.as.controller.remote.TransactionalProtocolOperationHandler$ExecuteRequestHandler$2.execute(TransactionalProtocolOperationHandler.java:177)
at org.jboss.as.protocol.mgmt.ManagementRequestContextImpl$1.doExecute(ManagementRequestContextImpl.java:70)
at org.jboss.as.protocol.mgmt.ManagementRequestContextImpl$AsyncTaskRunner.run(ManagementRequestContextImpl.java:160)
at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
at java.lang.Thread.run(Thread.java:745)
at org.jboss.threads.JBossThread.run(JBossThread.java:485)
2019-12-13 14:10:23,220 ERROR [org.jboss.as.controller.management-operation] (ServerService Thread Pool -- 90) WFLYCTL0027: Operation was interrupted before service container stability could be reached. Process should be restarted. Step that first updated the service container was 'undeploy' at address '[("deployment" => "filemover-1.5-SNAPSHOT-wildfly10.war")]'
and after that timer is stuck and warning at top keep on coming for ages.
Code (It is already according to suggestions found in searches):
#Stateless
public class FileMover {
#Schedule(hour = "*", minute = "*/15", persistent = false)
public void startJob() {
}
Can anyone suggest how to fix this or give any direction to fix this issue?
The issue was not in Wildfly 10 i.e when timer stuck no issue with undeploy or disable.
I have removed timer in data/timerservice folder during runtime but not fixed
I have removed timer when deplyable not there and restart system but still this issue comes.
This is problem with various other projects too.
In This project, the thing found is that an exception comes i.e mail not sent and program ends fine and next time exception comes again but this time it stuck. This issue is there in another project where this coming without exception.

Try and see if this code is doing the job better...
import javax.ejb.*;
#Singleton
#Startup
public class MyScheduler {
#Resource
private TimerService timerService;
private Timer timer;
#PostConstruct
private void init() {
TimerConfig timerConfig = new TimerConfig(null, false);
ScheduleExpression se = new ScheduleExpression().hour("*").minute("*/15");
timer = timerService.createCalendarTimer(se, timerConfig);
}
#PreDestroy
private void shutdown() {
timer.cancel();
}
}

Talked to Wildfly Confguration Guys in company and this was purely migration related issue in Mailbox settings of Wildfly Server where recipients were not added so wildfly seems busy in finding recipients.
After adding recipients, Schedulers are working fine as well as Undeploy/Disable application without any issue.

FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.NoClassDefFoundError

I am new to hadoop.
I am trying to setup Giraph to run on hadoop-2.6.5 with yarn.
When I submit the Giraph job the job gets submitted successfully but fails and I get below log in container syslog:
2018-01-30 12:09:01,190 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster
for application appattempt_1517293264136_0002_000002 2018-01-30
12:09:01,437 WARN [main] org.apache.hadoop.util.NativeCodeLoader:
Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable 2018-01-30 12:09:01,471 INFO
[main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with
tokens: 2018-01-30 12:09:01,471 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind:
YARN_AM_RM_TOKEN, Service: , Ident: (appAttemptId { application_id {
id: 2 cluster_timestamp: 1517293264136 } attemptId: 2 } keyId:
-1485907628) 2018-01-30 12:09:01,583 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Using mapred
newApiCommitter. 2018-01-30 12:09:02,154 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in
config null 2018-01-30 12:09:02,207 FATAL [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting
MRAppMaster java.lang.NoClassDefFoundError:
io/netty/buffer/ByteBufAllocator at
org.apache.giraph.bsp.BspOutputFormat.getOutputCommitter(BspOutputFormat.java:62)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:470)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:452)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1541)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:452)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:371)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1499)
at java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1496)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
Caused by: java.lang.ClassNotFoundException:
io.netty.buffer.ByteBufAllocator at
java.net.URLClassLoader.findClass(URLClassLoader.java:381) at
java.lang.ClassLoader.loadClass(ClassLoader.java:424) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at
java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 13 more
2018-01-30 12:09:02,209 INFO [main] org.apache.hadoop.util.ExitUtil:
Exiting with status 1
Diagnosis in logs shows following log:
Application application_1517293264136_0002 failed 2 times due to AM
Container for appattempt_1517293264136_0002_000002 exited with
exitCode: 1 For more detailed output, check application tracking
page:http://172.16.0.218:8088/proxy/application_1517293264136_0002/Then,
click on links to logs of each attempt. Diagnostics: Exception from
container-launch. Container id: container_1517293264136_0002_02_000001
Exit code: 1 Stack trace: ExitCodeException exitCode=1: at
org.apache.hadoop.util.Shell.runCommand(Shell.java:575) at
org.apache.hadoop.util.Shell.run(Shell.java:478) at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:766)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) Container exited with a
non-zero exit code 1 Failing this attempt. Failing the application.
The class it is failing at is io/netty/buffer/ByteBufAllocator which is in netty-all jar: https://mvnrepository.com/artifact/io.netty/netty-all
From other questions I have tried adding the jar in HADOOP_CLASSPATH.
Yogin-Patel:hadoop yoginpatel$ echo $HADOOP_CLASSPATH
/Users/yoginpatel/Downloads/gradle-4.3/caches/modules-2/files-2.1/io.netty/netty-all/4.0.43.Final/9781746a179070e886e1fb4b1971a6bbf02061a4/netty-all-4.0.43.Final.jar
Yogin-Patel:hadoop yoginpatel$
It shows up in hadoop classpath as well.
Yogin-Patel:hadoop yoginpatel$ hadoop classpath
/Users/yoginpatel/hadoop/etc/hadoop:/Users/yoginpatel/hadoop/share/hadoop/common/lib/*:/Users/yoginpatel/hadoop/share/hadoop/common/*:/Users/yoginpatel/hadoop/share/hadoop/hdfs:/Users/yoginpatel/hadoop/share/hadoop/hdfs/lib/*:/Users/yoginpatel/hadoop/share/hadoop/hdfs/*:/Users/yoginpatel/hadoop/share/hadoop/yarn/lib/*:/Users/yoginpatel/hadoop/share/hadoop/yarn/*:/Users/yoginpatel/hadoop/share/hadoop/mapreduce/lib/*:/Users/yoginpatel/hadoop/share/hadoop/mapreduce/*:/Users/yoginpatel/Downloads/gradle-4.3/caches/modules-2/files-2.1/io.netty/netty-all/4.0.43.Final/9781746a179070e886e1fb4b1971a6bbf02061a4/netty-all-4.0.43.Final.jar:/contrib/capacity-scheduler/*.jar
Yogin-Patel:hadoop yoginpatel$
I am trying to setup in development environment. This is single node setup.
I have even tried
job.addFileToClassPath(new Path("/Users/yoginpatel/Downloads/gradle-4.3/caches/modules-2/files-2.1/io.netty/netty-all/4.0.43.Final/9781746a179070e886e1fb4b1971a6bbf02061a4/netty-all-4.0.43.Final.jar"));
None of the approaches helped. How do I make hadoop node get the necessary jar accessed?
This is a GiraphJob submit code which would submit map reduce job to the cluster:
#Test
public void testPageRank() throws IOException, ClassNotFoundException, InterruptedException {
GiraphConfiguration giraphConf = new GiraphConfiguration(getConf());
giraphConf.setWorkerConfiguration(1,1,100);
GiraphConstants.SPLIT_MASTER_WORKER.set(giraphConf, false);
giraphConf.setVertexInputFormatClass(JsonLongDoubleFloatDoubleVertexInputFormat.class);
GiraphFileInputFormat.setVertexInputPath(giraphConf,
new Path("/input/tiny-graph.txt"));
giraphConf.setVertexOutputFormatClass(IdWithValueTextOutputFormat.class);
giraphConf.setComputationClass(PageRankComputation.class);
GiraphJob giraphJob = new GiraphJob(giraphConf, "page-rank");
giraphJob.getInternalJob().addFileToClassPath(new Path("/Users/yoginpatel/Downloads/gradle-4.3/caches/modules-2/files-2.1/io.netty/netty-all/4.0.43.Final/9781746a179070e886e1fb4b1971a6bbf02061a4/netty-all-4.0.43.Final.jar"));
FileOutputFormat.setOutputPath(giraphJob.getInternalJob(),
new Path("/output/page-rank2"));
giraphJob.run(true);
}
private Configuration getConf() {
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://localhost:9000");
conf.set("yarn.resourcemanager.address", "localhost:8032");
// framework is now "yarn", should be defined like this in mapred-site.xm
conf.set("mapreduce.framework.name", "yarn");
return conf;
}

I got it working by putting giraph's jar with dependencies in the hadoop lib path:
cp giraph-1.3.0-SNAPSHOT-for-hadoop-2.6.5-jar-with-dependencies.jar ~/hadoop/share/hadoop/mapreduce/lib/

Spark and Java: Exception thrown in awaitResult

I am trying to connect a Spark cluster running within a virtual machine with IP 10.20.30.50 and port 7077 from within a Java application and run the word count example:
SparkConf conf = new SparkConf().setMaster("spark://10.20.30.50:7077").setAppName("wordCount");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> textFile = sc.textFile("hdfs://localhost:8020/README.md");
String result = Long.toString(textFile.count());
JavaRDD<String> words = textFile.flatMap((FlatMapFunction<String, String>) s -> Arrays.asList(s.split(" ")).iterator());
JavaPairRDD<String, Integer> pairs = words.mapToPair((PairFunction<String, String, Integer>) s -> new Tuple2<>(s, 1));
JavaPairRDD<String, Integer> counts = pairs.reduceByKey((Function2<Integer, Integer, Integer>) (a, b) -> a + b);
counts.saveAsTextFile("hdfs://localhost:8020/tmp/output");
sc.stop();
return result;
The Java application shows the following stack trace:
Running Spark version 2.0.1
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Changing view acls to: lii5ka
Changing modify acls to: lii5ka
Changing view acls groups to:
Changing modify acls groups to:
SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(lii5ka); groups with view permissions: Set(); users with modify permissions: Set(lii5ka); groups with modify permissions: Set()
Successfully started service 'sparkDriver' on port 61267.
Registering MapOutputTracker
Registering BlockManagerMaster
Created local directory at /private/var/folders/4k/h0sl02993_99bzt0dzv759000000gn/T/blockmgr-51de868d-3ba7-40be-8c53-f881f97ced63
MemoryStore started with capacity 2004.6 MB
Registering OutputCommitCoordinator
Logging initialized #48403ms
jetty-9.2.z-SNAPSHOT
Started o.s.j.s.ServletContextHandler#1316e7ec{/jobs,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#782de006{/jobs/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#2d0353{/jobs/job,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#381e24a0{/jobs/job/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#1c138dc8{/stages,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#b29739c{/stages/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#63f6de31{/stages/stage,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#2a04ddcb{/stages/stage/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#2af9688e{/stages/pool,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#6a0c5bde{/stages/pool/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#3f5e17f8{/storage,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#33b86f5d{/storage/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#5264dcbc{/storage/rdd,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#5a3ebf85{/storage/rdd/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#159082ed{/environment,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#6522c585{/environment/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#115774a1{/executors,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#3e3a3399{/executors/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#2f2c5959{/executors/threadDump,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#5c51afd4{/executors/threadDump/json,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#76893a83{/static,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#19c07930{/,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#54eb0dc0{/api,null,AVAILABLE}
Started o.s.j.s.ServletContextHandler#5953786{/stages/stage/kill,null,AVAILABLE}
Started ServerConnector#2eeb8bd6{HTTP/1.1}{0.0.0.0:4040}
Started #48698ms
Successfully started service 'SparkUI' on port 4040.
Bound SparkUI to 0.0.0.0, and started at http://192.168.0.104:4040
Connecting to master spark://10.20.30.50:7077...
Successfully created connection to /10.20.30.50:7077 after 25 ms (0 ms spent in bootstraps)
Connecting to master spark://10.20.30.50:7077...
Still have 2 requests outstanding when connection from /10.20.30.50:7077 is closed
Failed to connect to master 10.20.30.50:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) ~[scala-library-2.11.8.jar:na]
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) ~[scala-library-2.11.8.jar:na]
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:96) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1$$anon$1.run(StandaloneAppClient.scala:106) ~[spark-core_2.11-2.0.1.jar:2.0.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_102]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_102]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_102]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_102]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
Caused by: java.io.IOException: Connection from /10.20.30.50:7077 closed
at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:128) ~[spark-network-common_2.11-2.0.1.jar:2.0.1]
at org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:109) ~[spark-network-common_2.11-2.0.1.jar:2.0.1]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:257) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:182) ~[spark-network-common_2.11-2.0.1.jar:2.0.1]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:208) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:194) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:828) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:621) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
... 1 common frames omitted
In the Spark Master log on 10.20.30.50, I get the following error message:
16/11/05 14:47:20 ERROR OneForOneStrategy: Error while decoding incoming Akka PDU of length: 1298
akka.remote.transport.AkkaProtocolException: Error while decoding incoming Akka PDU of length: 1298
Caused by: akka.remote.transport.PduCodecException: Decoding PDU failed.
at akka.remote.transport.AkkaPduProtobufCodec$.decodePdu(AkkaPduCodec.scala:167)
at akka.remote.transport.ProtocolStateActor.akka$remote$transport$ProtocolStateActor$$decodePdu(AkkaProtocolTransport.scala:580)
at akka.remote.transport.ProtocolStateActor$$anonfun$4.applyOrElse(AkkaProtocolTransport.scala:375)
at akka.remote.transport.ProtocolStateActor$$anonfun$4.applyOrElse(AkkaProtocolTransport.scala:343)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at akka.actor.FSM$class.processEvent(FSM.scala:604)
at akka.remote.transport.ProtocolStateActor.processEvent(AkkaProtocolTransport.scala:269)
at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:598)
at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:592)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at akka.remote.transport.ProtocolStateActor.aroundReceive(AkkaProtocolTransport.scala:269)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).
at com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
at akka.remote.WireFormats$AkkaProtocolMessage.<init>(WireFormats.java:6643)
at akka.remote.WireFormats$AkkaProtocolMessage.<init>(WireFormats.java:6607)
at akka.remote.WireFormats$AkkaProtocolMessage$1.parsePartialFrom(WireFormats.java:6703)
at akka.remote.WireFormats$AkkaProtocolMessage$1.parsePartialFrom(WireFormats.java:6698)
at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:141)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at akka.remote.WireFormats$AkkaProtocolMessage.parseFrom(WireFormats.java:6821)
at akka.remote.transport.AkkaPduProtobufCodec$.decodePdu(AkkaPduCodec.scala:168)
... 19 more
Additional Information
The example works fine when I use new SparkConf().setMaster("local") instead
I can connect to the Spark Master with spark-shell --master spark://10.20.30.50:7077 on the very same machine

Looks like network error in the first place (but actually NOT) in the disguise of version mismatch of spark . You can point to correct
version of spark jars mostly assembly jars.
This issue may happen due to version miss match in Hadoop RPC call using Protobuffer.
when a protocol message being parsed is invalid in some way, e.g. it contains a malformed varint or a negative byte length.
My experience with protobuf, InvalidProtocolBufferException can happen, only when the message was not able to parse(programatically if you are parsing protobuf message, may be message legth is zero or message is corrupted...).
Spark uses Akka Actors for Message Passing between Master/Driver and Workers and Internally akka uses googles protobuf to communicate. see method below from AkkaPduCodec.scala)
override def decodePdu(raw: ByteString): AkkaPdu = {
try {
val pdu = AkkaProtocolMessage.parseFrom(raw.toArray)
if (pdu.hasPayload) Payload(ByteString(pdu.getPayload.asReadOnlyByteBuffer()))
else if (pdu.hasInstruction) decodeControlPdu(pdu.getInstruction)
else throw new PduCodecException("Error decoding Akka PDU: Neither message nor control message were contained", null)
} catch {
case e: InvalidProtocolBufferException â‡’ throw new PduCodecException("Decoding PDU failed.", e)
}
}
But in your case, since its version mismatch, new protobuf version message cant be parsed from old version of parser... or something like...
If you are using maven other dependencies, pls. review.

It turned out that I had Spark version 1.5.2 running in the virtual machine and used version 2.0.1 of the Spark library in Java. I fixed the issue by using the appropriate Spark library version in my pom.xml which is
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.2</version>
</dependency>
Another problem (that occurred later) was, that I also had to pin the Scala version with which the library was build. This is the _2.10 suffix in the artifactId.
Basically #RamPrassad's answer pointed me into the right direction but didn't give a clear advice what I need to do to fix my problem.
By the way: I couldn't update Spark in the virtual machine, since it was brought to me by the HortonWorks distribution...

This error also happens when you have strange characters that you wanna insert into a table, for example:
INSERT INTO mytable
(select 'GARDENﬠy para el uso para el control de rat쟣om﷠rata' as text)
So the solution is to change the encoding text or to remove these special characters like this:
INSERT INTO mytable
(select regexp_replace('GARDENﬠy para el uso para el control de rat쟣om﷠rata',
'[\u0100-\uffff]', '') as text)

I got this same error while trying to read in data from my SQL table. My particular issue was that I did not give the SQL user sufficient permissions to read in the data.
My SQL user had permission to run a SELECT statement, but when I look at the log via select * from stv_recents (on redshift) I saw that it also performs a UNLOAD command to s3. Here's a resource to GRANT permissions on redshift
Not sure if this is still relevant for you seeing that this was almost 6 years ago, but I see this in your error:
SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(lii5ka); groups with view permissions: Set(); users with modify permissions: Set(lii5ka); groups with modify permissions: Set()
which looks like you also didn't have sufficient permissions set on your SQL user

Shutting down Apache ActiveMQ

I'm running a Spring Boot (1.3.5) console application with an embedded ActiveMQ server (5.10.0), which works just fine for receiving messages. However, I'm having trouble shutting down the application without exceptions.
This exception is thrown once for each queue, after hitting Ctrl-C:
2016-09-21 15:46:36.561 ERROR 18275 --- [update]] o.apache.activemq.broker.BrokerService : Failed to start Apache ActiveMQ ([my-mq-server, null], {})
java.lang.IllegalStateException: Shutdown in progress
at java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:66)
at java.lang.Runtime.addShutdownHook(Runtime.java:211)
at org.apache.activemq.broker.BrokerService.addShutdownHook(BrokerService.java:2446)
at org.apache.activemq.broker.BrokerService.doStartBroker(BrokerService.java:693)
at org.apache.activemq.broker.BrokerService.startBroker(BrokerService.java:684)
at org.apache.activemq.broker.BrokerService.start(BrokerService.java:605)
at org.apache.activemq.transport.vm.VMTransportFactory.doCompositeConnect(VMTransportFactory.java:127)
at org.apache.activemq.transport.vm.VMTransportFactory.doConnect(VMTransportFactory.java:56)
at org.apache.activemq.transport.TransportFactory.connect(TransportFactory.java:65)
at org.apache.activemq.ActiveMQConnectionFactory.createTransport(ActiveMQConnectionFactory.java:314)
at org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:329)
at org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:302)
at org.apache.activemq.ActiveMQConnectionFactory.createConnection(ActiveMQConnectionFactory.java:242)
at org.apache.activemq.jms.pool.PooledConnectionFactory.createConnection(PooledConnectionFactory.java:283)
at org.apache.activemq.jms.pool.PooledConnectionFactory$1.makeObject(PooledConnectionFactory.java:96)
at org.apache.activemq.jms.pool.PooledConnectionFactory$1.makeObject(PooledConnectionFactory.java:93)
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.create(GenericKeyedObjectPool.java:1041)
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:357)
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:279)
at org.apache.activemq.jms.pool.PooledConnectionFactory.createConnection(PooledConnectionFactory.java:243)
at org.apache.activemq.jms.pool.PooledConnectionFactory.createConnection(PooledConnectionFactory.java:212)
at org.springframework.jms.support.JmsAccessor.createConnection(JmsAccessor.java:180)
at org.springframework.jms.listener.AbstractJmsListeningContainer.createSharedConnection(AbstractJmsListeningContainer.java:413)
at org.springframework.jms.listener.AbstractJmsListeningContainer.refreshSharedConnection(AbstractJmsListeningContainer.java:398)
at org.springframework.jms.listener.DefaultMessageListenerContainer.refreshConnectionUntilSuccessful(DefaultMessageListenerContainer.java:925)
at org.springframework.jms.listener.DefaultMessageListenerContainer.recoverAfterListenerSetupFailure(DefaultMessageListenerContainer.java:899)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1075)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2016-09-21 15:46:36.564 INFO 18275 --- [update]] o.apache.activemq.broker.BrokerService : Apache ActiveMQ 5.12.3 (my-mq-server, null) is shutting down
It seems as the DefaultMessageListenerContainer tries to start an ActiveMQ server which doesn't make sense to me. I've set the phase of the BrokerService to Integer.MAX_INT - 1 and the phase of DefaultJmsListeningContainerFactory to Integer.MAX_INT to make it go away before the ActiveMQ server is stopped.
I have this in my main():
public static void main(String[] args) {
final ConfigurableApplicationContext context = SpringApplication.run(SiteServer.class, args);
context.registerShutdownHook();
}
I've tried setting daemon to true as suggested here: Properly Shutting Down ActiveMQ and Spring DefaultMessageListenerContainer.
Any ideas? Thanks! =)

Found it. This problem occurs when the Camel context is shutdown after the BrokerService. Adding proper life cycle management so that Camel is shutdown before resolved the issue. Now everything is shutdown in a clean way without errors.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache Spark stopping JVM when master not available - java

Related

Mongo Connection issue with error "state should be: open"

Wildfly 10 to Wildfly 16 : EJB Scheduler stuck, Not able to undeploy/disable deployments and need to kill server

FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.NoClassDefFoundError

Spark and Java: Exception thrown in awaitResult

Shutting down Apache ActiveMQ

Categories

Resources