I have partitioned reader and writer. There are some queries getting transaction timedout while getting resultset back from cursor. But I couldn't able to get catch the exception. I need to catch the exception and update the status back to track those reports.
Below is the exception from logs:
com.ibm.ws.batch.JobLogger CWWKY0024W: The current chunk was rolled back to the previous checkpoint for step genGAHJSONReport in job instance 553 and job execution 553. Step metrics = [(READ_SKIP_COUNT,0), (PROCESS_SKIP_COUNT,0), (WRITE_SKIP_COUNT,0), (FILTER_COUNT,0), (COMMIT_COUNT,1), (READ_COUNT,5000), (WRITE_COUNT,5000), (ROLLBACK_COUNT,0)]
com.ibm.ws.batch.JobLogger CWWKY0030I: An exception occurred while running the step genGAHJSONReport.
com.ibm.jbatch.container.exception.BatchContainerRuntimeException: Failure in Read-Process-Write Loop
at com.ibm.jbatch.container.controller.impl.ChunkStepControllerImpl.invokeChunk(ChunkStepControllerImpl.java:704)
at com.ibm.jbatch.container.controller.impl.ChunkStepControllerImpl.invokeCoreStep(ChunkStepControllerImpl.java:795)
at com.ibm.jbatch.container.controller.impl.BaseStepControllerImpl.execute(BaseStepControllerImpl.java:295)
at com.ibm.jbatch.container.controller.impl.ExecutionTransitioner.doExecutionLoop(ExecutionTransitioner.java:118)
at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl.executeCoreTransitionLoop(WorkUnitThreadControllerImpl.java:96)
at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl.executeWorkUnit(WorkUnitThreadControllerImpl.java:178)
at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl$AbstractControllerHelper.runExecutionOnThread(WorkUnitThreadControllerImpl.java:503)
at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl.runExecutionOnThread(WorkUnitThreadControllerImpl.java:92)
at com.ibm.jbatch.container.util.BatchWorkUnit.run(BatchWorkUnit.java:113)
at com.ibm.ws.context.service.serializable.ContextualRunnable.run(ContextualRunnable.java:79)
at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:232)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.ibm.jbatch.container.exception.TransactionManagementException: javax.transaction.RollbackException
at com.ibm.jbatch.container.transaction.impl.JTAUserTransactionAdapter.commit(JTAUserTransactionAdapter.java:108)
at com.ibm.jbatch.container.controller.impl.ChunkStepControllerImpl.invokeChunk(ChunkStepControllerImpl.java:656)
... 15 more
Caused by: javax.transaction.RollbackException
at com.ibm.tx.jta.impl.TransactionImpl.stage3CommitProcessing(TransactionImpl.java:978)
at com.ibm.tx.jta.impl.TransactionImpl.processCommit(TransactionImpl.java:778)
at com.ibm.tx.jta.impl.TransactionImpl.commit(TransactionImpl.java:711)
at com.ibm.tx.jta.impl.TranManagerImpl.commit(TranManagerImpl.java:165)
at com.ibm.tx.jta.impl.TranManagerSet.commit(TranManagerSet.java:113)
at com.ibm.tx.jta.impl.UserTransactionImpl.commit(UserTransactionImpl.java:162)
at com.ibm.tx.jta.embeddable.impl.EmbeddableUserTransactionImpl.commit(EmbeddableUserTransactionImpl.java:101)
at com.ibm.ws.transaction.services.UserTransactionService.commit(UserTransactionService.java:72)
at com.ibm.jbatch.container.transaction.impl.JTAUserTransactionAdapter.commit(JTAUserTransactionAdapter.java:101)
Related
when I use Spark: rdd.saveAsText(path, LzopCodec.class),get Exception like this, Is that casue java buffer size limit?
maybe there is one row string size is over 2G ?
Caused by: java.lang.IllegalArgumentException
at java.nio.ByteBuffer.allocate(ByteBuffer.java:334)
at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:810)
at org.apache.hadoop.io.Text.encode(Text.java:451)
at org.apache.hadoop.io.Text.set(Text.java:198)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2$$anonfun$31$$anonfun$apply$54.apply(RDD.scala:1505)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2$$anonfun$31$$anonfun$apply$54.apply(RDD.scala:1504)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:125)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:123)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1413)
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:135)
org.apache.spark.SparkException: Job aborted due to stage failure: Task 3224 in stage 22.0 failed 4 times, most recent failure: Lost task 3224.3 in stage 22.0 (TID 10280, hadoop12.fj8.yiducloud.cn, executor 14): org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:151)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:79)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException
at java.nio.ByteBuffer.allocate(ByteBuffer.java:334)
at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:810)
at org.apache.hadoop.io.Text.encode(Text.java:451)
at org.apache.hadoop.io.Text.set(Text.java:198)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2$$anonfun$31$$anonfun$apply$54.apply(RDD.scala:1505)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2$$anonfun$31$$anonfun$apply$54.apply(RDD.scala:1504)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:125)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:123)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1413)
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:135)
I am not able to convert a 3.7GB avro file to parquet format using ConvertAvroToParquet.
My setup: ExecuteSQL 1.10.0 > ConvertAvroToParquet 1.10.0 > PutS3Object 1.10.0.
ConvertAvroToParquet settings are by default.
2020-09-24 20:54:40,534 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog#6c8e0773 checkpointed with 645 Records and 0 Swap Files in 5 milliseconds (Stop-the-world time = 1 milliseconds, Clear Edit Logs time = 1 millis), max Transaction ID 9971
2020-09-24 20:54:48,015 INFO [pool-12-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2020-09-24 20:54:48,015 INFO [pool-12-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 2 records in 0 milliseconds
2020-09-24 20:55:03,820 INFO [Timer-Driven Process Thread-7] o.a.p.hadoop.InternalParquetRecordWriter Flushing mem columnStore to file. allocated memory: 100899470
2020-09-24 20:55:03,953 ERROR [Timer-Driven Process Thread-7] o.a.n.p.parquet.ConvertAvroToParquet ConvertAvroToParquet[id=c08ff95c-0174-1000-9e67-1f59b4d34dfe] ConvertAvroToParquet[id=c08ff95c-0174-1000-9e67-1f59b4d34dfe] failed to process session due to java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset(); Processor Administratively Yielded for 1 sec: java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.toBytes(RunLengthBitPackingHybridEncoder.java:254)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesWriter.getBytes(RunLengthBitPackingHybridValuesWriter.java:65)
at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:148)
at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:114)
at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:308)
at org.apache.nifi.processors.parquet.ConvertAvroToParquet.lambda$onTrigger$0(ConvertAvroToParquet.java:159)
at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2990)
at org.apache.nifi.processors.parquet.ConvertAvroToParquet.onTrigger(ConvertAvroToParquet.java:141)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1176)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2020-09-24 20:55:03,954 WARN [Timer-Driven Process Thread-7] o.a.n.controller.tasks.ConnectableTask Administratively Yielding ConvertAvroToParquet[id=c08ff95c-0174-1000-9e67-1f59b4d34dfe] due to uncaught Exception: java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.toBytes(RunLengthBitPackingHybridEncoder.java:254)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesWriter.getBytes(RunLengthBitPackingHybridValuesWriter.java:65)
at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:148)
at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:114)
at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:308)
at org.apache.nifi.processors.parquet.ConvertAvroToParquet.lambda$onTrigger$0(ConvertAvroToParquet.java:159)
at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2990)
at org.apache.nifi.processors.parquet.ConvertAvroToParquet.onTrigger(ConvertAvroToParquet.java:141)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1176)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2020-09-24 20:55:52,897 INFO [Timer-Driven Process Thread-4] o.a.p.hadoop.InternalParquetRecordWriter Flushing mem columnStore to file. allocated memory: 101841856
What can this be?
I did what you proposed, Pdeuxa, and it worked perfectly for a small table, but not for large table. So, I increased the JVM's heap memory in the nifi-1.10.0/conf/bootstrap.conf file, and it worked for me.
JVM memory settings
#java.arg.2=-Xms512m
#java.arg.3=-Xmx512m
java.arg.2=-Xms2048m
java.arg.3=-Xmx2048m
Thanks for your time and your attention, Pdeuxa.
TL,DR: ParquetWriter was interrupted by other exception, and it shouldn't be reused after the exception. In your case, it should be an OOM exception.
I encountered the same issue recently in a project Hudi. This exception was thrown by ParquetWriter when I was trying to write a record into a Parquet file.
2023-01-31 11:54:20,199 ERROR org.apache.hudi.io.HoodieMergeHandle [] - Error writing record HoodieRecord{key=HoodieKey { recordKey=149389890 partitionPath=}, currentLocation='null', newLocation='null'}
java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.toBytes(RunLengthBitPackingHybridEncoder.java:254)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesWriter.getBytes(RunLengthBitPackingHybridValuesWriter.java:69)
at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:60)
at org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:387)
at org.apache.parquet.column.impl.ColumnWriteStoreBase.sizeCheck(ColumnWriteStoreBase.java:235)
at org.apache.parquet.column.impl.ColumnWriteStoreBase.endRecord(ColumnWriteStoreBase.java:222)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.endRecord(ColumnWriteStoreV1.java:29)
at org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endMessage(MessageColumnIO.java:307)
at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:172)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
at org.apache.hudi.io.storage.HoodieBaseParquetWriter.write(HoodieBaseParquetWriter.java:80)
at org.apache.hudi.io.storage.HoodieAvroParquetWriter.writeAvro(HoodieAvroParquetWriter.java:76)
at org.apache.hudi.io.HoodieMergeHandle.writeToFile(HoodieMergeHandle.java:386)
at org.apache.hudi.io.HoodieMergeHandle.writeRecord(HoodieMergeHandle.java:315)
at org.apache.hudi.io.HoodieMergeHandle.writeInsertRecord(HoodieMergeHandle.java:296)
at org.apache.hudi.io.HoodieMergeHandle.writeIncomingRecords(HoodieMergeHandle.java:399)
at org.apache.hudi.io.HoodieMergeHandle.close(HoodieMergeHandle.java:414)
at org.apache.hudi.table.action.commit.FlinkMergeHelper.runMerge(FlinkMergeHelper.java:133)
at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdateInternal(HoodieFlinkCopyOnWriteTable.java:375)
at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdate(HoodieFlinkCopyOnWriteTable.java:366)
at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:238)
at org.apache.hudi.sink.compact.CompactFunction.doCompaction(CompactFunction.java:110)
at org.apache.hudi.sink.compact.CompactFunction.processElement(CompactFunction.java:101)
at org.apache.hudi.sink.compact.CompactFunction.processElement(CompactFunction.java:46)
at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:233)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:524)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:758)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:951)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:930)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:744)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
After checking the source code of parquet-column. The cause of this exception is that the parquet writer can't be reused if it was interrupted by other exception. According to the source code of org.apache.parquet.column.impl.ColumnWriterBase#writePage, repetitionLevelColumn, definitionLevelColumn and dataColumn will be reset after writePage(...), but the reset code is not in finally block, so, if there is anything wrong in writePage(...), the reset code won't be executed.
abstract class ColumnWriterBase implements ColumnWriter {
...
void writePage() {
if (valueCount == 0) {
throw new ParquetEncodingException("writing empty page");
}
this.rowsWrittenSoFar += pageRowCount;
if (DEBUG)
LOG.debug("write page");
try {
writePage(pageRowCount, valueCount, statistics, repetitionLevelColumn, definitionLevelColumn, dataColumn);
} catch (IOException e) {
throw new ParquetEncodingException("could not write page for " + path, e);
}
repetitionLevelColumn.reset();
definitionLevelColumn.reset();
dataColumn.reset();
valueCount = 0;
resetStatistics();
pageRowCount = 0;
}
}
In my case, the exception thrown in writePage(...) was a class not found exception, and my code caught it wrongly, and want to reuse the same writer.
Caused by: java.lang.ClassNotFoundException: com.github.luben.zstd.NoPool
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:68)
at org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:74)
at org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:52)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createOutputStream(ZstandardCodec.java:109)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createOutputStream(ZstandardCodec.java:100)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.compress(CodecFactory.java:165)
at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:168)
at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:59)
at org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:387)
at org.apache.parquet.column.impl.ColumnWriteStoreBase.sizeCheck(ColumnWriteStoreBase.java:235)
at org.apache.parquet.column.impl.ColumnWriteStoreBase.endRecord(ColumnWriteStoreBase.java:222)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.endRecord(ColumnWriteStoreV1.java:29)
I know this kind of question have been asked previously, but I still don't get solution after reading their posts, so I decide to post this question again from here.
I Am working on Java multi-threaded application where I am trying to run HQL queries using JDBC on Hive environment. I have bunch of hive-sql queries and i am executing them on Hive in parallel with multiple threads and I am getting following exception when queries count more (for example, if i am running more than 100 queries). can some one please check this and help me on this?
2020-06-16 06:00:45,314 ERROR [main]: Terminal exception
java.lang.Exception: Map step agg_cas_auth_reinstate_derive failed.
at com.mine.idn.magellan.ParallelExecGraph.execute(ParallelExecGraph.java:198)
at com.mine.idn.magellan.WarehouseSession.executeMap(WarehouseSession.java:332)
at com.mine.idn.magellan.StandAloneEnv.execute(StandAloneEnv.java:872)
at com.mine.idn.magellan.StandAloneEnv.execute(StandAloneEnv.java:778)
at com.mine.idn.magellan.StandAloneEnv.executeAndExit(StandAloneEnv.java:642)
at com.mine.idn.magellan.StandAloneEnv.main(StandAloneEnv.java:77)
Caused by: java.sql.SQLException: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. java.io.IOException: Bad file descriptor
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.io.IOException: Bad file descriptor
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2850)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2685)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2591)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1077)
at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2007)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:479)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:469)
at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:190)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:601)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:599)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:599)
at org.apache.hadoop.mapred.JobClient.getJobInner(JobClient.java:609)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:639)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:559)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:201)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79)
Caused by: java.io.IOException: Bad file descriptor
at java.io.FileInputStream.close0(Native Method)
at java.io.FileInputStream.access$000(FileInputStream.java:49)
at java.io.FileInputStream$1.close(FileInputStream.java:336)
at java.io.FileDescriptor.closeAll(FileDescriptor.java:212)
at java.io.FileInputStream.close(FileInputStream.java:334)
at java.io.BufferedInputStream.close(BufferedInputStream.java:483)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2676)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2661)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2741)
... 22 more
at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:385)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at com.mine.idn.magellan.WarehouseSession.executeMapStep(WarehouseSession.java:797)
at com.mine.idn.magellan.WarehouseSession.access$000(WarehouseSession.java:23)
at com.mine.idn.magellan.WarehouseSession$ParallelExecResources.executeMapStep(WarehouseSession.java:91)
at com.mine.idn.magellan.ParallelExecGraph$Node.run(ParallelExecGraph.java:85)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
What i dont understand is, why hadoop framework throwing - Bad File Descriptor Exception? My Java code invoking Hadoop-Hive code and its throwing this exception.
Also one more thing is this issue is intermittent, not consistent. If i re-run the same application, most of the cases, it went through.
Thank you for
I'm using FtpStreamingMessageSource in combination with poller with following config (#InboundChannelAdapter(channel = "ftpChannel", poller = #Poller("pollerMetadata"))):
#Bean
public PollerMetadata pollerMetadata(PlatformTransactionManager transactionManager) {
PeriodicTrigger trigger = new PeriodicTrigger(TimeUnit.SECONDS.toMillis(30));
trigger.setFixedRate(true);
MatchAlwaysTransactionAttributeSource source = new MatchAlwaysTransactionAttributeSource();
source.setTransactionAttribute(new DefaultTransactionAttribute());
TransactionInterceptor interceptor = new TransactionInterceptor(transactionManager, source);
PollerMetadata metadata = new PollerMetadata();
metadata.setTrigger(trigger);
metadata.setTransactionSynchronizationFactory(synchronizationFactory());
metadata.setAdviceChain(Collections.singletonList(interceptor));
return metadata;
}
It was working OK, until today I had a DB problem and an exception The last packet sent successfully to the server was 30,079 milliseconds ago. and (ERROR): LoggingHandler org.springframework.transaction.TransactionSystemException: Could not roll back JDBC transaction; nested exception is java.sql.SQLException: Connection is
closed.
Somehow, poller stopped working after this, even though HikariCP managed to recover from exception after a while. Seems like a thread that was doing polling job was terminated.
Any idea how to make thread recover and continue with processing? After I restarted application, everything was back to normal.
UPDATE
this is the last exception I got before poller stopped working
2019-03-06 14:34:45 (ERROR): LoggingHandler org.springframework.transaction.TransactionSystemException: Could not roll back JDBC transaction; nested exception is java.sql.SQLException: Connection is
closed
at org.springframework.jdbc.datasource.DataSourceTransactionManager.doRollback(DataSourceTransactionManager.java:290)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.processRollback(AbstractPlatformTransactionManager.java:853)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.rollback(AbstractPlatformTransactionManager.java:830)
at org.springframework.transaction.interceptor.TransactionAspectSupport.completeTransactionAfterThrowing(TransactionAspectSupport.java:503)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:285)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
at com.sun.proxy.$Proxy71.call(Unknown Source)
at org.springframework.integration.endpoint.AbstractPollingEndpoint$Poller$1.run(AbstractPollingEndpoint.java:353)
at org.springframework.integration.util.ErrorHandlingTaskExecutor$1.run(ErrorHandlingTaskExecutor.java:55)
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
at org.springframework.integration.util.ErrorHandlingTaskExecutor.execute(ErrorHandlingTaskExecutor.java:51)
at org.springframework.integration.endpoint.AbstractPollingEndpoint$Poller.run(AbstractPollingEndpoint.java:344)
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:81)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: Connection is closed
at com.zaxxer.hikari.pool.ProxyConnection$ClosedConnection.lambda$getClosedConnection$0(ProxyConnection.java:489)
at com.sun.proxy.$Proxy67.rollback(Unknown Source)
at com.zaxxer.hikari.pool.ProxyConnection.rollback(ProxyConnection.java:370)
at com.zaxxer.hikari.pool.HikariProxyConnection.rollback(HikariProxyConnection.java)
at org.springframework.jdbc.datasource.DataSourceTransactionManager.doRollback(DataSourceTransactionManager.java:287)
... 22 more
There is no "termination" of a polling thread; polling uses a TaskScheduler and when the poll completes (whether an exception occurred or not) the thread is returned to the pool, ready for the next poll.
It's probably too late now (if you restarted your app) but if it happens again take a thread dump; most likely the poller thread is "stuck" somewhere in user (or DB) code.
Getting the following error on running the spark Job on Spark 2.0.
The error is Random in nature & does not occur all the time.
Once the tasks are being created most of them are completed properly while a few gets hung & throws the following error after a while.
I have tried increasing the following properties spark.executor.heartbeatInterval & spark.network.timeout but of no use.
17/07/23 20:46:35 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(driver,[Lscala.Tuple2;#597e9d16,BlockManagerId(driver, 128.164.190.35, 38337))] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518)
at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547)
at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547)
at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1857)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:81)
... 14 more
Yes, the problem is indeed due to GC as it used to pause the tasks, changing the default GC to G1GC reduced the problem. Thanks
XX:+UseG1GC
https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html