MapReduce input from HTable on AWS timeout - java

I'm having a bit of trouble figuring out how to excute a simple MapReduce job with input to be sourced from HTable using emr-5.4.0.
when I ran on ERM, it failed because of time out.(emr-5.3.0 also failed)
I have done a bunch of google searching to find out how to proceed, but could'nt find anything useful.
my process:
I created a EMR cluster using Hbase.The version is:
Amazon 2.7.3, Ganglia 3.7.2, HBase 1.3.0, Hive 2.1.1, Hue 3.11.0,
Phoenix 4.9.0
According to the sample from the manual:http://hbase.apache.org/book.html#mapreduce.example, and write my job likes:
public class TableMapTest3 {
// TableMapper
public static class MyMapper extends TableMapper<Text, Text> {
protected void map(ImmutableBytesWritable key, Result inputValue, Context context)
throws IOException, InterruptedException {
String keyS = new String(key.get(), "UTF-8");
String value = new String(inputValue.getValue(Bytes.toBytes("contents"), Bytes.toBytes("name")), "UTF-8");
System.out.println("TokenizerMapper :" + value);
context.write(new Text(keyS), new Text(value));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
System.out.println("url:" + conf.get("fs.defaultFS"));
System.out.println("hbase.zookeeper.quorum:" + conf.get("hbase.zookeeper.quorum"));
Connection conn = ConnectionFactory.createConnection(conf);
Admin admin = conn.getAdmin();
String tableName = "TableMapTest";
TableName tablename = TableName.valueOf(tableName);
Table hTable = null;
// check table exists
if (admin.tableExists(tablename)) {
System.out.println(tablename + " table existed...");
hTable = conn.getTable(tablename);
ResultScanner resultScanner = hTable.getScanner(new Scan());
for (Result result : resultScanner) {
Delete delete = new Delete(result.getRow());
hTable.delete(delete);
}
} else {
HTableDescriptor tableDesc = new HTableDescriptor(tablename);
tableDesc.addFamily(new HColumnDescriptor("contents"));
admin.createTable(tableDesc);
System.out.println(tablename + " table created...");
hTable = conn.getTable(tablename);
}
// insert data
for (int i = 0; i < 20; i++) {
Put put = new Put(Bytes.toBytes(String.valueOf(i)));
put.addColumn(Bytes.toBytes("contents"), Bytes.toBytes("name"), Bytes.toBytes("value" + i));
hTable.put(put);
}
hTable.close();
// Hadoop
Job job = Job.getInstance(conf, TableMapTest3.class.getSimpleName());
job.setJarByClass(TableMapTest3.class);
job.setOutputFormatClass(NullOutputFormat.class);
Scan scan = new Scan();
TableMapReduceUtil.initTableMapperJob(tableName, scan, MyMapper.class, Text.class, Text.class, job);
System.out.println("TableMapTest result:" + job.waitForCompletion(true));
}
}
package my source to jar and upload it to the cluster. then I ssh on the master and run my job:
hadoop jar zz-0.0.1.jar com.ziki.zz.TableMapTest3
I got the follow messages:
url:hdfs://ip-xxx.ap-northeast-1.compute.internal:8020
hbase.zookeeper.quorum:localhost
TableMapTest table created...
17/05/05 01:31:23 INFO impl.TimelineClientImpl: Timeline service address: http://ip-xxx.ap-northeast-1.compute.internal:8188/ws/v1/timeline/
17/05/05 01:31:23 INFO client.RMProxy: Connecting to ResourceManager at ip-xxx.ap-northeast-1.compute.internal/172.31.4.228:8032
17/05/05 01:31:24 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/05/05 01:31:31 INFO mapreduce.JobSubmitter: number of splits:1
17/05/05 01:31:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1493947058255_0001
17/05/05 01:31:33 INFO impl.YarnClientImpl: Submitted application application_1493947058255_0001
17/05/05 01:31:34 INFO mapreduce.Job: The url to track the job: http://ip-xxx.ap-northeast-1.compute.internal:20888/proxy/application_1493947058255_0001/
17/05/05 01:31:34 INFO mapreduce.Job: Running job: job_1493947058255_0001
17/05/05 01:31:57 INFO mapreduce.Job: Job job_1493947058255_0001 running in uber mode : false
17/05/05 01:31:57 INFO mapreduce.Job: map 0% reduce 0%
after a well, i get the error:
17/05/05 01:42:26 INFO mapreduce.Job: Task Id : attempt_1493947058255_0001_m_000000_0, Status : FAILED
AttemptID:attempt_1493947058255_0001_m_000000_0 Timed out after 600 secs
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
17/05/05 01:52:56 INFO mapreduce.Job: Task Id : attempt_1493947058255_0001_m_000000_1, Status : FAILED
AttemptID:attempt_1493947058255_0001_m_000000_1 Timed out after 600 secs
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
and some syslogs
2017-05-05 01:31:59,664 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1493947058255_0001_m_000000 Task Transitioned from SCHEDULED to RUNNING
2017-05-05 01:32:08,168 INFO [Socket Reader #1 for port 33348] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1493947058255_0001 (auth:SIMPLE)
2017-05-05 01:32:08,227 INFO [IPC Server handler 0 on 33348] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1493947058255_0001_m_000002 asked for a task
2017-05-05 01:32:08,231 INFO [IPC Server handler 0 on 33348] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1493947058255_0001_m_000002 given task: attempt_1493947058255_0001_m_000000_0
2017-05-05 01:42:25,382 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1493947058255_0001_m_000000_0: AttemptID:attempt_1493947058255_0001_m_000000_0 Timed out after 600 secs
2017-05-05 01:42:25,389 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1493947058255_0001_m_000000_0 TaskAttempt Transitioned from RUNNING to FAIL_CONTAINER_CLEANUP
2017-05-05 01:42:25,392 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1493947058255_0001_01_000002 taskAttempt attempt_1493947058255_0001_m_000000_0
2017-05-05 01:42:25,392 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1493947058255_0001_m_000000_0
2017-05-05 01:42:25,394 INFO [ContainerLauncher #1] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : ip-xxx.ap-northeast-1.compute.internal:8041
2017-05-05 01:42:25,457 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1493947058255_0001_m_000000_0 TaskAttempt Transitioned from FAIL_CONTAINER_CLEANUP to FAIL_TASK_CLEANUP
2017-05-05 01:42:25,458 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: TASK_ABORT
2017-05-05 01:42:25,460 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1493947058255_0001_m_000000_0 TaskAttempt Transitioned from FAIL_TASK_CLEANUP to FAILED
2017-05-05 01:42:25,495 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved ip-xxx.ap-northeast-1.compute.internal to /default-rack
2017-05-05 01:42:25,500 INFO [Thread-83] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 1 failures on node ip-xxx.ap-northeast-1.compute.internal
2017-05-05 01:42:25,502 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1493947058255_0001_m_000000_1 TaskAttempt Transitioned from NEW to UNASSIGNED
2017-05-05 01:42:25,503 INFO [Thread-83] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Added attempt_1493947058255_0001_m_000000_1 to list of failed maps
2017-05-05 01:42:25,557 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:3 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:1 RackLocal:0
2017-05-05 01:42:25,582 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1493947058255_0001: ask=1 release= 0 newContainers=0 finishedContainers=1 resourcelimit=<memory:1024, vCores:1> knownNMs=2
2017-05-05 01:42:25,582 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1493947058255_0001_01_000002
2017-05-05 01:42:25,583 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1493947058255_0001_m_000000_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
I just use the default settings and run a simple job.why these errors happend?
if i'm missing anything, let me know!
anyway, thanks for your help - appreciate it!

I found the answer from : here
you cant use HConfiguration (because it defaults to a localhost quorum) What you'll have to do is use the configuration that amazon sets up for you (located in /etc/hbase/conf/hbase-site.xml)
The connection code looks like this:
Configuration conf = new Configuration();
String hbaseSite = "/etc/hbase/conf/hbase-site.xml";
conf.addResource(new File(hbaseSite).toURI().toURL());
HBaseAdmin.checkHBaseAvailable(conf);

Related

apoc.periodic.iterate fails with exception: java.util.concurrent.RejectedExecutionException

I am trying run the annotation function of graphaware within Neo4J (see documentation here). I have a set of 5000 nodes (KnowledgeArticles) with textual data in the content property. To annotate those I run the following query in Neo4J desktop:
CALL apoc.periodic.iterate(
"MATCH (n:KnowledgeArticle) RETURN n",
"CALL ga.nlp.annotate({text: n.content, id: id(n)})
YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:true})
After annotating approximately 200 to 300 KnowledgeArticles the database shuts down and provides the error:
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure `apoc.periodic.iterate`: Caused by:
java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.FutureTask#373b81ee rejected from
java.util.concurrent.ThreadPoolExecutor#285a2901[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 288]
I have experimented using different values for batchSize or setting iterateList to false, but none of this helped.
Also, I have tried performing the above iterate call limiting it only to 150 nodes. This works fine for the first time I call it, but when I perform it for a second time it again provides the same error, stating that the completed_task is about 200 to 300. The processor in the back thus seems to 'remember' the amount of tasks it has run in total as of the first time the database has started.
Could you help me resolve this issue. I want to run the above query not necessarily from Neo4j desktop, but eventually with py2neo from Python using graph.run([iterate-query]). If there is thus any way of solving this from Python, that would be even better.
Thank you!
PS. The debug log provides the following output (as of the last few iterations of the annotation up until the shut down):
2019-05-21 12:46:10.359+0000 INFO [c.g.n.p.p.AnnotatedTextPersister] Start storing annotatedText 251906
2019-05-21 12:46:13.784+0000 INFO [c.g.n.p.p.AnnotatedTextPersister] end storing annotatedText 251906. It took: 3425
2019-05-21 12:46:13.786+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:13.788+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:13.800+0000 INFO [c.g.n.u.ProcessorUtils] Taking default pipeline from configuration : myPipeline
2019-05-21 12:46:13.868+0000 INFO [c.g.n.p.s.StanfordTextProcessor] Time for pipeline annotation (myPipeline): 67. Text length: 954
2019-05-21 12:46:13.869+0000 INFO [c.g.n.NLPManager] Time to annotate 68
2019-05-21 12:46:13.869+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:13.869+0000 INFO [c.g.n.p.p.AnnotatedTextPersister] Start storing annotatedText 251907
2019-05-21 12:46:15.848+0000 INFO [c.g.n.p.p.AnnotatedTextPersister] end storing annotatedText 251907. It took: 1978
2019-05-21 12:46:15.848+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:15.862+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:15.915+0000 INFO [c.g.n.u.ProcessorUtils] Taking default pipeline from configuration : myPipeline
2019-05-21 12:46:16.294+0000 INFO [c.g.n.p.s.StanfordTextProcessor] Time for pipeline annotation (myPipeline): 378. Text length: 2641
2019-05-21 12:46:16.295+0000 INFO [c.g.n.NLPManager] Time to annotate 379
2019-05-21 12:46:16.296+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:16.296+0000 INFO [c.g.n.p.p.AnnotatedTextPersister] Start storing annotatedText 251908
2019-05-21 12:46:16.421+0000 INFO [o.n.k.a.DatabaseAvailabilityGuard] Database graph.db is unavailable.
2019-05-21 12:46:17.018+0000 INFO [c.g.s.f.b.GraphAwareServerBootstrapper] stopped
2019-05-21 12:46:17.020+0000 INFO [o.n.g.f.GraphDatabaseFacadeFactory] Shutdown started
2019-05-21 12:46:17.149+0000 INFO [o.n.g.f.GraphDatabaseFacadeFactory] Shutting down 'graph.db' database.
2019-05-21 12:46:17.150+0000 INFO [o.n.g.f.GraphDatabaseFacadeFactory] Shutdown started
2019-05-21 12:46:17.164+0000 INFO [o.n.b.i.BackupServer] BackupServer communication server shutting down and unbinding from /127.0.0.1:6362
2019-05-21 12:46:17.226+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by database shutdown # txId: 7720 checkpoint started...
2019-05-21 12:46:17.247+0000 INFO [o.n.k.i.s.c.CountsTracker] Rotated counts store at transaction 7720 to [/Users/{my.user.name}/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-e2babea7-0332-4c2c-bf1d-076d4feed49a/installation-3.5.4/data/databases/graph.db/neostore.counts.db.a], from [/Users/{my.user.name}/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-e2babea7-0332-4c2c-bf1d-076d4feed49a/installation-3.5.4/data/databases/graph.db/neostore.counts.db.b].
2019-05-21 12:46:17.644+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by database shutdown # txId: 7720 checkpoint completed in 418ms
2019-05-21 12:46:17.647+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] No log version pruned, last checkpoint was made in version 3
2019-05-21 12:46:17.698+0000 INFO [o.n.i.d.DiagnosticsManager] --- STOPPING diagnostics START ---
2019-05-21 12:46:17.700+0000 INFO [o.n.i.d.DiagnosticsManager] --- STOPPING diagnostics END ---
2019-05-21 12:46:17.706+0000 INFO [c.g.r.BaseGraphAwareRuntime] Shutting down GraphAware Runtime...
2019-05-21 12:46:17.709+0000 INFO [c.g.r.m.BaseModuleManager] Shutting down module UIDM
2019-05-21 12:46:17.709+0000 INFO [c.g.r.m.BaseModuleManager] Shutting down module NLP
2019-05-21 12:46:17.712+0000 INFO [c.g.r.s.RotatingTaskScheduler] Terminating task scheduler...
2019-05-21 12:46:17.712+0000 INFO [c.g.r.s.RotatingTaskScheduler] Task scheduler terminated successfully.
2019-05-21 12:46:17.714+0000 INFO [c.g.r.BaseGraphAwareRuntime] GraphAware Runtime shut down.

Kafka streams shutting down and don't run

Good morning guys,
I'm trying to run a Kafka Stream Application but every time that i try, it start and close in sequence. Below is the result printed on the console
[main] WARN org.apache.kafka.clients.consumer.ConsumerConfig - The configuration 'admin.retries' was supplied but isn't a known config.
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version : 2.1.0
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : eec43959745f444f
[application-brute-test-client-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [application-brute-test-client-StreamThread-1] Starting
[main] INFO org.apache.kafka.streams.KafkaStreams - stream-client [application-brute-test-client] Started Streams client
[application-brute-test-client-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [application-brute-test-client-StreamThread-1] State transition from CREATED to RUNNING
[Thread-0] INFO org.apache.kafka.streams.KafkaStreams - stream-client [application-brute-test-client] State transition from RUNNING to PENDING_SHUTDOWN
[kafka-streams-close-thread] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [application-brute-test-client-StreamThread-1] Informed to shut down
[kafka-streams-close-thread] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [application-brute-test-client-StreamThread-1] State transition from RUNNING to PENDING_SHUTDOWN
[application-brute-test-client-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [application-brute-test-client-StreamThread-1] Shutting down
[application-brute-test-client-StreamThread-1] INFO org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=application-brute-test-client-StreamThread-1-restore-consumer, groupId=] Unsubscribed all topics or patterns and assigned partitions
[application-brute-test-client-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [application-brute-test-client-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
[application-brute-test-client-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [application-brute-test-client-StreamThread-1] Shutdown complete
[kafka-admin-client-thread | application-brute-test-client-admin] INFO org.apache.kafka.clients.admin.internals.AdminMetadataManager - [AdminClient clientId=application-brute-test-client-admin] Metadata update failed
org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call.
[kafka-streams-close-thread] INFO org.apache.kafka.streams.KafkaStreams - stream-client [application-brute-test-client] State transition from PENDING_SHUTDOWN to NOT_RUNNING
[Thread-0] INFO org.apache.kafka.streams.KafkaStreams - stream-client [application-brute-test-client] Streams client stopped completely
watch out for the following line:
[application-brute-test-client-StreamThread-1] Informed to shut down
The application was informed to shut down, but i don't know why. Can someone help me with this problem?
Here is my simple code only to test the stream:
Properties properties = new Properties();
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "myserver");
properties.put(StreamsConfig.APPLICATION_ID_CONFIG, "application-brute-test");
properties.put(StreamsConfig.CLIENT_ID_CONFIG, "application-brute-test-client");
properties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
properties.setProperty(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE); // Enable exacly once feature
properties.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass()); // Set a default key serde
properties.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass()); // Set a default key serde
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> input = builder.stream("neurotech_propostas", Consumed.with(Serdes.String(), Serdes.String()));
input.print(Printed.toSysOut());
KStream<String, String> output = input.mapValues((value) -> value.toUpperCase());
output.to("brute-test-out");
KafkaStreams stream = new KafkaStreams(builder.build(), properties);
stream.cleanUp();
stream.start();
Runtime.getRuntime().addShutdownHook(new Thread(stream::close));
To solve the problem I simply stopped using JUnit to run the Stream and executed through a Main class. Running Kafka Streams via JUnit was causing this trouble.
Maybe in this envirorment the JUnit don't hold the Thread execution?

Spark : forEachPartition not working

I want to use foreachpartition to save data in my database, but I noticed that this function not working
RDD2.foreachRDD(new VoidFunction<JavaRDD<Object>>() {
#Override
public void call(JavaRDD<Object> t) throws Exception {
t.foreachPartition(new VoidFunction<Iterator<Object>>() {
#Override
public void call(Iterator<Object> t) throws Exception {
System.out.println("test");
} }
);
}});
When I run this example, my my spark program will be blocked in these steps, without showing others RDD or even print test
6/05/30 10:18:41 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
16/05/30 10:18:41 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
16/05/30 10:18:41 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2946 bytes)
16/05/30 10:18:41 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/05/30 10:18:41 INFO SparkContext: Starting job: foreachPartition at BrokerSpout.java:265
16/05/30 10:18:41 INFO RecurringTimer: Started timer for BlockGenerator at time 1464596321600
-------------------------------------------
Time: 1464596321500 ms
-------------------------------------------
16/05/30 10:18:41 INFO ReceivedBlockTracker: Deleting batches ArrayBuffer()
16/05/30 10:18:41 INFO ReceiverTracker: Registered receiver for stream 0 from 10.25.30.41:59407
16/05/30 10:18:41 INFO InputInfoTracker: remove old batch metadata:
16/05/30 10:18:41 INFO ReceiverSupervisorImpl: Starting receiver
16/05/30 10:18:41 INFO ReceiverSupervisorImpl: Called receiver onStart
16/05/30 10:18:41 INFO ReceiverSupervisorImpl: Waiting for receiver to be stopped
16/05/30 10:18:42 INFO SparkContext: Starting job: foreachPartition at BrokerSpout.java:265
16/05/30 10:18:42 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16/05/30 10:18:42 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
As you see in my logging, it says that my spark is Waiting for a receiver to be stopped, but my receiver must not be stopped, if not, what is the purpose of spark streaming if we have to stop the sender.

Hadoop Pipes Wordcount example: NullPointerException in LocalJobRunner

I am trying to run the sample example in this tutorial about Hadoop Pipes:
I'm succeeding in compiling and everything. However, after it runs it shows me NullPointerException error. I tried many ways and read many similar questions, but wasn't able to find an actual solution for this problem.
Note: I am running on a single machine in a pseudo-distributed environment.
hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriters=true -input /input -output /output -program /bin/wordcount
DEPRECATED: Use of this script to execute mapred command is deprecated.
Instead use the mapred command for it.
15/02/18 01:09:02 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/02/18 01:09:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/02/18 01:09:02 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
15/02/18 01:09:03 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
15/02/18 01:09:04 INFO mapred.FileInputFormat: Total input paths to process : 1
15/02/18 01:09:04 INFO mapreduce.JobSubmitter: number of splits:1
15/02/18 01:09:04 INFO Configuration.deprecation: hadoop.pipes.java.recordreader is deprecated. Instead, use mapreduce.pipes.isjavarecordreader
15/02/18 01:09:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local143452495_0001
15/02/18 01:09:06 INFO mapred.LocalDistributedCacheManager: Localized hdfs://localhost:9000/bin/wordcount as file:/tmp/hadoop-abdulrahman/mapred/local/1424214545411/wordcount
15/02/18 01:09:06 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/02/18 01:09:06 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/02/18 01:09:06 INFO mapreduce.Job: Running job: job_local143452495_0001
15/02/18 01:09:06 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
15/02/18 01:09:06 INFO mapred.LocalJobRunner: Waiting for map tasks
15/02/18 01:09:06 INFO mapred.LocalJobRunner: Starting task: attempt_local143452495_0001_m_000000_0
15/02/18 01:09:06 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
15/02/18 01:09:06 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/input/data.txt:0+68
15/02/18 01:09:07 INFO mapred.MapTask: numReduceTasks: 1
15/02/18 01:09:07 INFO mapreduce.Job: Job job_local143452495_0001 running in uber mode : false
15/02/18 01:09:07 INFO mapreduce.Job: map 0% reduce 0%
15/02/18 01:09:07 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/02/18 01:09:07 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/02/18 01:09:07 INFO mapred.MapTask: soft limit at 83886080
15/02/18 01:09:07 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/02/18 01:09:07 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/02/18 01:09:07 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/02/18 01:09:08 INFO mapred.LocalJobRunner: map task executor complete.
15/02/18 01:09:08 WARN mapred.LocalJobRunner: job_local143452495_0001
java.lang.Exception: java.lang.NullPointerException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:104)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:69)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/02/18 01:09:08 INFO mapreduce.Job: Job job_local143452495_0001 failed with state FAILED due to: NA
15/02/18 01:09:08 INFO mapreduce.Job: Counters: 0
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:264)
at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:503)
at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:518)
Edit: I downloaded the sourcecode of hadoop and tracked where the exception is happening, it seems that the exception occurs in the initialization stage, and thus the code inside the mapper/reducer isn't really the problem.
The function in Hadoop that produces the exception is this one:
/** Run a set of tasks and waits for them to complete. */
435 private void runTasks(List<RunnableWithThrowable> runnables,
436 ExecutorService service, String taskType) throws Exception {
437 // Start populating the executor with work units.
438 // They may begin running immediately (in other threads).
439 for (Runnable r : runnables) {
440 service.submit(r);
441 }
442
443 try {
444 service.shutdown(); // Instructs queue to drain.
445
446 // Wait for tasks to finish; do not use a time-based timeout.
447 // (See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6179024)
448 LOG.info("Waiting for " + taskType + " tasks");
449 service.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
450 } catch (InterruptedException ie) {
451 // Cancel all threads.
452 service.shutdownNow();
453 throw ie;
454 }
455
456 LOG.info(taskType + " task executor complete.");
457
458 // After waiting for the tasks to complete, if any of these
459 // have thrown an exception, rethrow it now in the main thread context.
460 for (RunnableWithThrowable r : runnables) {
461 if (r.storedException != null) {
462 throw new Exception(r.storedException);
463 }
464 }
465 }
The problem though is that it is storing the exception and then throwing it, which is preventing me from knowing the actual source of the exception.
Any help?
Also, if you need me to post more details please let me know.
Thank you,
So after a lot of research, I found out that the problem was actually caused by this line in pipes/Application.java (line 104):
byte[] password= jobToken.getPassword();
I changed the code and recompiled hadoop:
byte[] password= "no password".getBytes();
if (jobToken != null)
{
password= jobToken.getPassword();
}
I got this from here
This solved the problem, and my program currently runs, but I am facing another problem where the program actually hangs at map 0% reduce 0%
I will open another topic for that question.
Thank you,

Quartz Job executed multiple times simultaneously by each cluster machine, rather than one time by one machine for the entire cluster

Goal:
* Have Job1 run once for a three-node cluster every 10 minutes, and Job2 run once for the same cluster every 5 minutes. Each job generates an email; so at 10:55am I should receive only one Job2 email from the cluster, and at 11:00am I should receive one Job1 email and one Job2 email from the cluster, at 11:05am I should receive only one Job2 email from the cluster, and so on...
Problem:
* Job1 is being run multiple times every 10 minutes on each node in the cluster, and the same for Job2 (except every 5 minutes). This leads to many, many more than one or two emails.
Configuration:
* Three-node linux cluster
* Each machine NTP configured and time-sync'd
* Oracle DB
* Quartz v2.2.0 (cluster mode)
* Jobs configured via CronTrigger
* Each node has an instance of the same standalone Java application running on it, and the Java application instantiates an instance of the quartz scheduler in cluster-mode.
* quartz.properties files are identical on each machine.
I have investigated all the obvious potential causes, but nothing explains it or presents a fix. I have even tried inserting an artificial 10-second sleep instruction in the job, to ensure that it doesn't finish in under a second. Please find relevant artifacts below (quartz.properties and log output). Any help would be greatly appreciated!
Artifact #1:
============================================================================
============================================================================
Q U A R T Z --- P R O P E R T I E S
==================
#============================================================================
# Configure Main Scheduler Properties
#============================================================================
org.quartz.scheduler.instanceName: MyQrtzScheduler
org.quartz.scheduler.instanceId: AUTO
org.quartz.scheduler.skipUpdateCheck: true
#============================================================================
# Configure ThreadPool
#============================================================================
org.quartz.threadPool.class: org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount: 1
org.quartz.threadPool.threadPriority: 5
#============================================================================
# Configure JobStore
#============================================================================
org.quartz.jobStore.misfireThreshold: 2592000000
org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.oracle.OracleDelegate
org.quartz.jobStore.useProperties=false
org.quartz.jobStore.dataSource=myDS
org.quartz.jobStore.tablePrefix=QRTZ_
org.quartz.jobStore.isClustered=true
org.quartz.jobStore.clusterCheckinInterval=60000
#============================================================================
# Other Example Delegates
#============================================================================
#org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.DB2v6Delegate
#org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.DB2v7Delegate
#org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.DriverDelegate
#org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.HSQLDBDelegate
#org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.MSSQLDelegate
#org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.PointbaseDelegate
#org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
#org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.StdJDBCDelegate
#org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.WebLogicDelegate
#org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.oracle.OracleDelegate
#org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.oracle.WebLogicOracleDelegate
#============================================================================
# Configure Datasources
#============================================================================
org.quartz.dataSource.myDS.driver: oracle.jdbc.driver.OracleDriver
org.quartz.dataSource.myDS.URL: jdbc:oracle:thin:#myServer:myPort:blah
org.quartz.dataSource.myDS.user: myDBUser
org.quartz.dataSource.myDS.password: myDBPassword
org.quartz.dataSource.myDS.maxConnections: 2
org.quartz.dataSource.myDS.validationQuery: select 0
#============================================================================
# Configure Plugins
#============================================================================
org.quartz.plugin.shutdownHook.class: org.quartz.plugins.management.ShutdownHookPlugin
org.quartz.plugin.shutdownHook.cleanShutdown: true
org.quartz.plugin.triggerHistory.class=org.quartz.plugins.history.LoggingTriggerHistoryPlugin
org.quartz.plugin.jobHistory.class=org.quartz.plugins.history.LoggingJobHistoryPlugin
Artifact #2:
============================================================================
============================================================================
L O G --- O U T P U T
==================
2015-01-29 12:56:16,602 [main] INFO com.mycompany.myapp.jobs.QuartzHelper - Initializing Quartz scheduler...
2015-01-29 12:56:16,829 [main] INFO org.quartz.impl.StdSchedulerFactory - Using default implementation for ThreadExecutor
2015-01-29 12:56:16,855 [main] INFO org.quartz.core.SchedulerSignalerImpl - Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl
2015-01-29 12:56:16,855 [main] INFO org.quartz.core.QuartzScheduler - Quartz Scheduler v.2.2.0 created.
2015-01-29 12:56:16,857 [main] INFO org.quartz.plugins.management.ShutdownHookPlugin - Registering Quartz shutdown hook.
2015-01-29 12:56:16,859 [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Using db table-based data access locking (synchronization).
2015-01-29 12:56:16,864 [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - JobStoreTX initialized.
2015-01-29 12:56:16,865 [main] INFO org.quartz.core.QuartzScheduler - Scheduler meta-data: Quartz Scheduler (v2.2.0) 'MyQrtzScheduler' with instanceId 'node1_1422554176832'
Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.
NOT STARTED.
Currently in standby mode.
Number of jobs executed: 0
Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 1 threads.
Using job-store 'org.quartz.impl.jdbcjobstore.JobStoreTX' - which supports persistence. and is clustered.
2015-01-29 12:56:16,865 [main] INFO org.quartz.impl.StdSchedulerFactory - Quartz scheduler 'MyQrtzScheduler' initialized from specified file: '/my/install/directory/quartz.properties'
2015-01-29 12:56:16,866 [main] INFO org.quartz.impl.StdSchedulerFactory - Quartz scheduler version: 2.2.0
2015-01-29 12:56:16,866 [main] INFO com.mycompany.myapp.jobs.QuartzHelper - Quartz scheduler initialized successfully.
2015-01-29 12:59:53,450 [MyQrtzScheduler_QuartzSchedulerThread] DEBUG org.quartz.core.QuartzSchedulerThread - batch acquisition of 1 triggers
2015-01-29 13:00:00,007 [MyQrtzScheduler_QuartzSchedulerThread] DEBUG org.quartz.impl.jdbcjobstore.StdRowLockSemaphore - Lock 'TRIGGER_ACCESS' is desired by: MyQrtzScheduler_QuartzSchedulerThread
2015-01-29 13:00:00,008 [MyQrtzScheduler_QuartzSchedulerThread] DEBUG org.quartz.impl.jdbcjobstore.StdRowLockSemaphore - Lock 'TRIGGER_ACCESS' is being obtained: MyQrtzScheduler_QuartzSchedulerThread
2015-01-29 13:00:00,809 [MyQrtzScheduler_QuartzSchedulerThread] DEBUG org.quartz.impl.jdbcjobstore.StdRowLockSemaphore - Lock 'TRIGGER_ACCESS' given to: MyQrtzScheduler_QuartzSchedulerThread
2015-01-29 13:00:00,836 [MyQrtzScheduler_QuartzSchedulerThread] DEBUG org.quartz.impl.jdbcjobstore.StdRowLockSemaphore - Lock 'TRIGGER_ACCESS' returned by: MyQrtzScheduler_QuartzSchedulerThread
2015-01-29 13:00:00,839 [MyQrtzScheduler_QuartzSchedulerThread] DEBUG org.quartz.simpl.PropertySettingJobFactory - Producing instance of Job 'node2_1422546730757.Job1', class=com.mycompany.myapp.job.Job1
2015-01-29 13:00:00,851 [MyQrtzScheduler_Worker-1] INFO org.quartz.plugins.history.LoggingTriggerHistoryPlugin - Trigger node2_1422546730757.Job1Trigger fired job node2_1422546730757.Job1 at: 13:00:00 01/29/2015
2015-01-29 13:00:00,852 [MyQrtzScheduler_Worker-1] INFO org.quartz.plugins.history.LoggingJobHistoryPlugin - Job node2_1422546730757.Job1 fired (by trigger node2_1422546730757.Job1Trigger) at: 13:00:00 01/29/2015
2015-01-29 13:00:00,852 [MyQrtzScheduler_Worker-1] DEBUG org.quartz.core.JobRunShell - Calling execute on job node2_1422546730757.Job1
2015-01-29 13:00:00,853 [MyQrtzScheduler_Worker-1] INFO com.mycompany.myapp.job.Job1 - ***Executing Inbound File SLA Job...
2015-01-29 13:00:02,054 [MyQrtzScheduler_Worker-1] INFO com.mycompany.myapp.job.Job1 - ***Inbound File SLA Job: No SLA breaches found...
2015-01-29 13:00:02,150 [MyQrtzScheduler_Worker-1] INFO com.mycompany.myapp.job.Job1 - Job1 completed successfully in [1297ms]; sleeping [63703ms] to meet the required minimum runtime for quartz-jobs
2015-01-29 13:00:24,881 [QuartzScheduler_MyQrtzScheduler-node1_1422554176832_ClusterManager] DEBUG org.quartz.impl.jdbcjobstore.JobStoreTX - ClusterManager: Check-in complete.
2015-01-29 13:01:05,862 [MyQrtzScheduler_Worker-1] INFO com.mycompany.myapp.job.Job1 - Job1 sleep-delay completed.
2015-01-29 13:01:05,864 [MyQrtzScheduler_Worker-1] INFO org.quartz.plugins.history.LoggingJobHistoryPlugin - Job node2_1422546730757.Job1 execution complete at 13:01:05 01/29/2015 and reports: SUCCESS
2015-01-29 13:01:05,865 [MyQrtzScheduler_Worker-1] INFO org.quartz.plugins.history.LoggingTriggerHistoryPlugin - Trigger node2_1422546730757.Job1Trigger completed firing job node2_1422546730757.Job1 at 13:01:05 01/29/2015 with resulting trigger instruction code: DO NOTHING
2015-01-29 13:01:05,868 [MyQrtzScheduler_Worker-1] DEBUG org.quartz.impl.jdbcjobstore.StdRowLockSemaphore - Lock 'TRIGGER_ACCESS' is desired by: MyQrtzScheduler_Worker-1
2015-01-29 13:01:05,869 [MyQrtzScheduler_Worker-1] DEBUG org.quartz.impl.jdbcjobstore.StdRowLockSemaphore - Lock 'TRIGGER_ACCESS' is being obtained: MyQrtzScheduler_Worker-1
2015-01-29 13:01:05,872 [MyQrtzScheduler_Worker-1] DEBUG org.quartz.impl.jdbcjobstore.StdRowLockSemaphore - Lock 'TRIGGER_ACCESS' given to: MyQrtzScheduler_Worker-1
2015-01-29 13:01:05,880 [MyQrtzScheduler_Worker-1] DEBUG org.quartz.impl.jdbcjobstore.StdRowLockSemaphore - Lock 'TRIGGER_ACCESS' returned by: MyQrtzScheduler_Worker-1
2015-01-29 13:01:05,915 [MyQrtzScheduler_QuartzSchedulerThread] DEBUG org.quartz.core.QuartzSchedulerThread - batch acquisition of 1 triggers
2015-01-29 13:01:05,917 [MyQrtzScheduler_QuartzSchedulerThread] DEBUG org.quartz.impl.jdbcjobstore.StdRowLockSemaphore - Lock 'TRIGGER_ACCESS' is desired by: MyQrtzScheduler_QuartzSchedulerThread
2015-01-29 13:01:05,918 [MyQrtzScheduler_QuartzSchedulerThread] DEBUG org.quartz.impl.jdbcjobstore.StdRowLockSemaphore - Lock 'TRIGGER_ACCESS' is being obtained: MyQrtzScheduler_QuartzSchedulerThread
2015-01-29 13:01:05,921 [MyQrtzScheduler_QuartzSchedulerThread] DEBUG org.quartz.impl.jdbcjobstore.StdRowLockSemaphore - Lock 'TRIGGER_ACCESS' given to: MyQrtzScheduler_QuartzSchedulerThread
2015-01-29 13:01:05,954 [MyQrtzScheduler_QuartzSchedulerThread] DEBUG org.quartz.impl.jdbcjobstore.StdRowLockSemaphore - Lock 'TRIGGER_ACCESS' returned by: MyQrtzScheduler_QuartzSchedulerThread
2015-01-29 13:01:05,955 [MyQrtzScheduler_QuartzSchedulerThread] DEBUG org.quartz.simpl.PropertySettingJobFactory - Producing instance of Job 'node1_1422543657050.Job2', class=com.mycompany.myapp.jobs.Job2
2015-01-29 13:01:05,961 [MyQrtzScheduler_Worker-1] INFO org.quartz.plugins.history.LoggingTriggerHistoryPlugin - Trigger node1_1422543657050.Job2Trigger fired job node1_1422543657050.Job2 at: 13:01:05 01/29/2015
2015-01-29 13:01:05,962 [MyQrtzScheduler_Worker-1] INFO org.quartz.plugins.history.LoggingJobHistoryPlugin - Job node1_1422543657050.Job2 fired (by trigger node1_1422543657050.Job2Trigger) at: 13:01:05 01/29/2015
2015-01-29 13:01:05,963 [MyQrtzScheduler_Worker-1] DEBUG org.quartz.core.JobRunShell - Calling execute on job node1_1422543657050.Job2
2015-01-29 13:01:05,963 [MyQrtzScheduler_Worker-1] WARN com.mycompany.myapp.jobs.Job2 - No outbound files found; Outbound File SLA Job cannot check for SLA breaches.
2015-01-29 13:01:05,965 [MyQrtzScheduler_Worker-1] INFO org.quartz.plugins.history.LoggingJobHistoryPlugin - Job node1_1422543657050.Job2 execution complete at 13:01:05 01/29/2015 and reports: null
2015-01-29 13:01:05,966 [MyQrtzScheduler_Worker-1] INFO org.quartz.plugins.history.LoggingTriggerHistoryPlugin - Trigger node1_1422543657050.Job2Trigger completed firing job node1_1422543657050.Job2 at 13:01:05 01/29/2015 with resulting trigger instruction code: DO NOTHING
The following answer was given by the OP.
The problem was that I was defining quartz jobs with identities that have a unique group id (the scheduler id) instead of a group id common to all hosts in the cluster. Since the scheduler id is unique to the host, each host in the cluster would look to see if that job already existed using the fully qualified job name groupId.jobName and surely it found it didn't, so it would create a new instance of Job1 and Job2 during startup. The quartz jobs/triggers are never expired or cleared without an explicit request in Java or manual sql statement in Oracle. So over time the instances would build up, and instead of quartz running a single instance of Job1 and Job2, it would run all the instances of each job that had been created over time (hence the multiple executions and multiple email alerts).
The solution is that I replace schedulerId with a static string such as "MyQuartzJobs" when defining a job's identity.
Basically, I changed the following line of Java code:
JobDetail job =
newJob(Job1.class).withIdentity(JOB1_JOB_NAME, uniqueSchedulerId)
.withDescription(JOB1_DESC + " created [" + new Date() + "]")
.storeDurably(false)
.requestRecovery(false)
.build();
to something like the following:
JobDetail job =
newJob(Job1.class).withIdentity(JOB1_JOB_NAME, "MyQuartzJobs")
.withDescription(JOB1_DESC + " created [" + new Date() + "]")
.storeDurably(false)
.requestRecovery(false)
.build();

Categories