I am using rocketmq-spring to send message which version is 2.1.0, sometimes i got ConcurrentModificationException at org.apache.rocketmq.common.message.MessageDecoder.messageProperties2String(MessageDecoder.java:414),the following is a detailed log. Thanks!
#[xx, 10.xx.52] INFO 2022-05-07 15:31:11.043 [XNIO-1 task-74, 29de7f06241a3313, 29de7f06241a3313] com.xx.common.IpProducerService.asyncSendMessage:45 - contentMap{refNo=xx, system=xx, ip=null, platformId=xx, userId=xxx}
#[fp, 10.xx.52] INFO 2022-05-07 15:31:11.043 [XNIO-1 task-74, 29de7f06241a3313, 29de7f06241a3313] com.xx.rocketmq.producer.RocketMqProducer.asyncInfo:19 - -=-=-= [Async Sending Message] -=-=-=
Topic = TOPIC_xx_xx
Tag =
MessageId = null
DelayLevel = 0
Content = {"refNo":"xx","system":"xx","platformId":"xx","userId":"xx"}
#[fp, 10.xx.52] ERROR 2022-05-07 15:31:11.044 [AsyncSenderExecutor_3, , ] com.xx.rocketmq.producer.ProduceCallBack.onException:32 - asyncSendMessage caused exception.
java.util.ConcurrentModificationException: null
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
at java.util.HashMap$EntryIterator.next(HashMap.java:1471)
at java.util.HashMap$EntryIterator.next(HashMap.java:1469)
at org.apache.rocketmq.common.message.MessageDecoder.messageProperties2String(MessageDecoder.java:414)
at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl.sendKernelImpl(DefaultMQProducerImpl.java:790)
at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl.sendDefaultImpl(DefaultMQProducerImpl.java:584)
at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl.access$300(DefaultMQProducerImpl.java:97)
at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl$4.run(DefaultMQProducerImpl.java:511)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
#[xx, 10.xx.52] INFO 2022-05-07 15:31:11.044 [AsyncSenderExecutor_3, , ] com.xx.common.IpProducerService.handleResult:49 - async produce status is F
use different rocketMQTemplates instance to send same message.
asyncSend is doing on ThreadPool in rocketMQTemplates, there are lots of threads there, so you can't send same message(same message use the same propertiy HashMap, not thread-safe) using different rocketMQTemplates.
Related
We are using log4j2 for logging. After doing initial benchmark I have concluded that for same QPS, logging exception is 80 times slower than logging normal message.
Here is a sample error log message.
2022-08-25 11:09:14,699 - [pool-2-thread-1][{id=eb9bcf4f-1dcc-4cd7-8588-fd7916d623b8, path=/hellowworld}] - [Test2] ERROR - error wile doing something important java.lang.RuntimeException: exception
at Test2.recursiveCount(Test2.java:140) ~[logging.jar:?]
at Test2.recursiveCount(Test2.java:142) ~[logging.jar:?]
at Test2.recursiveCount(Test2.java:142) ~[logging.jar:?]
at Test2.access$000(Test2.java:17) ~[logging.jar:?]
at Test2$LoggerRunnable.run(Test2.java:75) ~[logging.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_202]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_202]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_202]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_202]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
Here is a sample log message
2022-08-25 18:12:02,125 - [pool-2-thread-1][{id=90e8ebda-e02a-4c48-bc37-560190bda40a, path=/hellowworld}] - [Test2] INFO - adding log with number=ok uuid=1661431322125
It could be attributed to
Note: Logging exceptions and stack traces will create temporary objects with any layout. (However, Layouts will only create these temporary objects when an exception actually occurs.) We haven't figured out a way to log exceptions and stack traces without creating temporary objects. That is unfortunate, but you probably still want to log them when they happen.
source log4j2 documentation
Are there any config change that can be done to reduce the overhead of exception logging? Like
For similar stacktrace errors only log few sample stacktrace and not all the stack traces
limiting the deapth of stacktrace?
anything else?
I am not able to convert a 3.7GB avro file to parquet format using ConvertAvroToParquet.
My setup: ExecuteSQL 1.10.0 > ConvertAvroToParquet 1.10.0 > PutS3Object 1.10.0.
ConvertAvroToParquet settings are by default.
2020-09-24 20:54:40,534 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog#6c8e0773 checkpointed with 645 Records and 0 Swap Files in 5 milliseconds (Stop-the-world time = 1 milliseconds, Clear Edit Logs time = 1 millis), max Transaction ID 9971
2020-09-24 20:54:48,015 INFO [pool-12-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2020-09-24 20:54:48,015 INFO [pool-12-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 2 records in 0 milliseconds
2020-09-24 20:55:03,820 INFO [Timer-Driven Process Thread-7] o.a.p.hadoop.InternalParquetRecordWriter Flushing mem columnStore to file. allocated memory: 100899470
2020-09-24 20:55:03,953 ERROR [Timer-Driven Process Thread-7] o.a.n.p.parquet.ConvertAvroToParquet ConvertAvroToParquet[id=c08ff95c-0174-1000-9e67-1f59b4d34dfe] ConvertAvroToParquet[id=c08ff95c-0174-1000-9e67-1f59b4d34dfe] failed to process session due to java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset(); Processor Administratively Yielded for 1 sec: java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.toBytes(RunLengthBitPackingHybridEncoder.java:254)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesWriter.getBytes(RunLengthBitPackingHybridValuesWriter.java:65)
at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:148)
at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:114)
at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:308)
at org.apache.nifi.processors.parquet.ConvertAvroToParquet.lambda$onTrigger$0(ConvertAvroToParquet.java:159)
at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2990)
at org.apache.nifi.processors.parquet.ConvertAvroToParquet.onTrigger(ConvertAvroToParquet.java:141)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1176)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2020-09-24 20:55:03,954 WARN [Timer-Driven Process Thread-7] o.a.n.controller.tasks.ConnectableTask Administratively Yielding ConvertAvroToParquet[id=c08ff95c-0174-1000-9e67-1f59b4d34dfe] due to uncaught Exception: java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.toBytes(RunLengthBitPackingHybridEncoder.java:254)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesWriter.getBytes(RunLengthBitPackingHybridValuesWriter.java:65)
at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:148)
at org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:114)
at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:308)
at org.apache.nifi.processors.parquet.ConvertAvroToParquet.lambda$onTrigger$0(ConvertAvroToParquet.java:159)
at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2990)
at org.apache.nifi.processors.parquet.ConvertAvroToParquet.onTrigger(ConvertAvroToParquet.java:141)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1176)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
2020-09-24 20:55:52,897 INFO [Timer-Driven Process Thread-4] o.a.p.hadoop.InternalParquetRecordWriter Flushing mem columnStore to file. allocated memory: 101841856
What can this be?
I did what you proposed, Pdeuxa, and it worked perfectly for a small table, but not for large table. So, I increased the JVM's heap memory in the nifi-1.10.0/conf/bootstrap.conf file, and it worked for me.
JVM memory settings
#java.arg.2=-Xms512m
#java.arg.3=-Xmx512m
java.arg.2=-Xms2048m
java.arg.3=-Xmx2048m
Thanks for your time and your attention, Pdeuxa.
TL,DR: ParquetWriter was interrupted by other exception, and it shouldn't be reused after the exception. In your case, it should be an OOM exception.
I encountered the same issue recently in a project Hudi. This exception was thrown by ParquetWriter when I was trying to write a record into a Parquet file.
2023-01-31 11:54:20,199 ERROR org.apache.hudi.io.HoodieMergeHandle [] - Error writing record HoodieRecord{key=HoodieKey { recordKey=149389890 partitionPath=}, currentLocation='null', newLocation='null'}
java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.toBytes(RunLengthBitPackingHybridEncoder.java:254)
at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesWriter.getBytes(RunLengthBitPackingHybridValuesWriter.java:69)
at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:60)
at org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:387)
at org.apache.parquet.column.impl.ColumnWriteStoreBase.sizeCheck(ColumnWriteStoreBase.java:235)
at org.apache.parquet.column.impl.ColumnWriteStoreBase.endRecord(ColumnWriteStoreBase.java:222)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.endRecord(ColumnWriteStoreV1.java:29)
at org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endMessage(MessageColumnIO.java:307)
at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:172)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
at org.apache.hudi.io.storage.HoodieBaseParquetWriter.write(HoodieBaseParquetWriter.java:80)
at org.apache.hudi.io.storage.HoodieAvroParquetWriter.writeAvro(HoodieAvroParquetWriter.java:76)
at org.apache.hudi.io.HoodieMergeHandle.writeToFile(HoodieMergeHandle.java:386)
at org.apache.hudi.io.HoodieMergeHandle.writeRecord(HoodieMergeHandle.java:315)
at org.apache.hudi.io.HoodieMergeHandle.writeInsertRecord(HoodieMergeHandle.java:296)
at org.apache.hudi.io.HoodieMergeHandle.writeIncomingRecords(HoodieMergeHandle.java:399)
at org.apache.hudi.io.HoodieMergeHandle.close(HoodieMergeHandle.java:414)
at org.apache.hudi.table.action.commit.FlinkMergeHelper.runMerge(FlinkMergeHelper.java:133)
at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdateInternal(HoodieFlinkCopyOnWriteTable.java:375)
at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdate(HoodieFlinkCopyOnWriteTable.java:366)
at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:238)
at org.apache.hudi.sink.compact.CompactFunction.doCompaction(CompactFunction.java:110)
at org.apache.hudi.sink.compact.CompactFunction.processElement(CompactFunction.java:101)
at org.apache.hudi.sink.compact.CompactFunction.processElement(CompactFunction.java:46)
at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:233)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:524)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:758)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:951)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:930)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:744)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
After checking the source code of parquet-column. The cause of this exception is that the parquet writer can't be reused if it was interrupted by other exception. According to the source code of org.apache.parquet.column.impl.ColumnWriterBase#writePage, repetitionLevelColumn, definitionLevelColumn and dataColumn will be reset after writePage(...), but the reset code is not in finally block, so, if there is anything wrong in writePage(...), the reset code won't be executed.
abstract class ColumnWriterBase implements ColumnWriter {
...
void writePage() {
if (valueCount == 0) {
throw new ParquetEncodingException("writing empty page");
}
this.rowsWrittenSoFar += pageRowCount;
if (DEBUG)
LOG.debug("write page");
try {
writePage(pageRowCount, valueCount, statistics, repetitionLevelColumn, definitionLevelColumn, dataColumn);
} catch (IOException e) {
throw new ParquetEncodingException("could not write page for " + path, e);
}
repetitionLevelColumn.reset();
definitionLevelColumn.reset();
dataColumn.reset();
valueCount = 0;
resetStatistics();
pageRowCount = 0;
}
}
In my case, the exception thrown in writePage(...) was a class not found exception, and my code caught it wrongly, and want to reuse the same writer.
Caused by: java.lang.ClassNotFoundException: com.github.luben.zstd.NoPool
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:68)
at org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:74)
at org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:52)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createOutputStream(ZstandardCodec.java:109)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createOutputStream(ZstandardCodec.java:100)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.compress(CodecFactory.java:165)
at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:168)
at org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:59)
at org.apache.parquet.column.impl.ColumnWriterBase.writePage(ColumnWriterBase.java:387)
at org.apache.parquet.column.impl.ColumnWriteStoreBase.sizeCheck(ColumnWriteStoreBase.java:235)
at org.apache.parquet.column.impl.ColumnWriteStoreBase.endRecord(ColumnWriteStoreBase.java:222)
at org.apache.parquet.column.impl.ColumnWriteStoreV1.endRecord(ColumnWriteStoreV1.java:29)
I am trying to read from bigquery using Java BigqueryIO.read method. but getting below error.
public POutput expand(PBegin pBegin) {
final String queryOperation = "select query";
return pBegin
.apply(BigQueryIO.readTableRows().fromQuery(queryOperation));
}
2020-06-08 19:32:01.391 ISTError message from worker: java.io.IOException: Failed to start reading from source: org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter#77f0db34 org.apache.beam.runners.dataflow.worker.WorkerCustomSources$UnboundedReaderIterator.start(WorkerCustomSources.java:792) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1320) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:151) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1053) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:748) Caused by: java.lang.UnsupportedOperationException: BigQuery source must be split before being read org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.createReader(BigQuerySourceBase.java:173) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.advance(UnboundedReadFromBoundedSource.java:467) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.access$300(UnboundedReadFromBoundedSource.java:446) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.advance(UnboundedReadFromBoundedSource.java:298) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.start(UnboundedReadFromBoundedSource.java:291) org.apache.beam.runners.dataflow.worker.WorkerCustomSources$UnboundedReaderIterator.start(WorkerCustomSources.java:787) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1320) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:151) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1053) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:748)
I suppose that this issue might be connected with missing tempLocation pipeline execution parameter when you are using DataflowRunner for cloud execution.
According to the documentation:
If tempLocation is not specified and gcpTempLocation is, tempLocation will not be populated.
Since this is just my presumption, I'll also encourage you to inspect native Apache Beam runtime logs to expand the overall issue evidence, as long as Stackdriver logs don't reflect a full picture of the problem.
There was raised a separate Jira tracker thread BEAM-9043, indicating this vaguely outputted error description.
Feel free to append more certain information to you origin question for any further concern or essential updates.
Getting the following error on running the spark Job on Spark 2.0.
The error is Random in nature & does not occur all the time.
Once the tasks are being created most of them are completed properly while a few gets hung & throws the following error after a while.
I have tried increasing the following properties spark.executor.heartbeatInterval & spark.network.timeout but of no use.
17/07/23 20:46:35 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(driver,[Lscala.Tuple2;#597e9d16,BlockManagerId(driver, 128.164.190.35, 38337))] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518)
at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547)
at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547)
at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1857)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:81)
... 14 more
Yes, the problem is indeed due to GC as it used to pause the tasks, changing the default GC to G1GC reduced the problem. Thanks
XX:+UseG1GC
https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html
Why is the following considered as a leak?
2016-12-04 09:24:01,534 ERROR [epollEventLoopGroup-2-1] [io.netty.util.ResourceLeakDetector] - LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki
/reference-counted-objects.html for more information.
Recent access records: 5
#5:
io.netty.buffer.AdvancedLeakAwareByteBuf.release(AdvancedLeakAwareByteBuf.java:955)
com.example.network.listener.netty.PreprocessHandler.handle(PreprocessHandler.java:42)
com.example.network.listener.netty.UdpHandlerChain.handle(UdpHandlerChain.java:17)
com.example.network.listener.netty.UdpRequestExecutor$1.run(UdpRequestExecutor.java:89)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
#4:
io.netty.buffer.AdvancedLeakAwareByteBuf.readBytes(AdvancedLeakAwareByteBuf.java:495)
com.example.network.listener.netty.PreprocessHandler.handle(PreprocessHandler.java:39)
com.example.network.listener.netty.UdpHandlerChain.handle(UdpHandlerChain.java:17)
com.example.network.listener.netty.UdpRequestExecutor$1.run(UdpRequestExecutor.java:89)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
#3:
io.netty.buffer.AdvancedLeakAwareByteBuf.retain(AdvancedLeakAwareByteBuf.java:927)
io.netty.buffer.AdvancedLeakAwareByteBuf.retain(AdvancedLeakAwareByteBuf.java:35)
io.netty.util.ReferenceCountUtil.retain(ReferenceCountUtil.java:36)
io.netty.channel.DefaultAddressedEnvelope.retain(DefaultAddressedEnvelope.java:89)
io.netty.channel.socket.DatagramPacket.retain(DatagramPacket.java:67)
io.netty.channel.socket.DatagramPacket.retain(DatagramPacket.java:27)
io.netty.util.ReferenceCountUtil.retain(ReferenceCountUtil.java:36)
com.example.network.listener.netty.UdpRequestExecutor.channelRead(UdpRequestExecutor.java:71)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351)
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359)
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
io.netty.channel.epoll.EpollDatagramChannel$EpollDatagramChannelUnsafe.epollInReady(EpollDatagramChannel.java:580)
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:402)
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:307)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873)
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
java.lang.Thread.run(Thread.java:745)
#2:
Hint: 'UdpRequestExecutor#0' will handle the message from this point.
io.netty.channel.DefaultAddressedEnvelope.touch(DefaultAddressedEnvelope.java:117)
io.netty.channel.socket.DatagramPacket.touch(DatagramPacket.java:85)
io.netty.channel.socket.DatagramPacket.touch(DatagramPacket.java:27)
io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:107)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351)
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359)
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
io.netty.channel.epoll.EpollDatagramChannel$EpollDatagramChannelUnsafe.epollInReady(EpollDatagramChannel.java:580)
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:402)
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:307)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873)
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
java.lang.Thread.run(Thread.java:745)
#1:
Hint: 'DefaultChannelPipeline$HeadContext#0' will handle the message from this point.
io.netty.channel.DefaultAddressedEnvelope.touch(DefaultAddressedEnvelope.java:117)
io.netty.channel.socket.DatagramPacket.touch(DatagramPacket.java:85)
io.netty.channel.socket.DatagramPacket.touch(DatagramPacket.java:27)
io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:107)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
io.netty.channel.epoll.EpollDatagramChannel$EpollDatagramChannelUnsafe.epollInReady(EpollDatagramChannel.java:580)
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:402)
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:307)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873)
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
java.lang.Thread.run(Thread.java:745)
Created at:
io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:170)
io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:131)
io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:73)
io.netty.channel.RecvByteBufAllocator$DelegatingHandle.allocate(RecvByteBufAllocator.java:124)
io.netty.channel.epoll.EpollDatagramChannel$EpollDatagramChannelUnsafe.epollInReady(EpollDatagramChannel.java:544)
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:402)
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:307)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873)
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
java.lang.Thread.run(Thread.java:745)
The last access is an explicit release()...
I'm using Netty 4.1.6.Final.
The exception message points to the Netty Wiki. From that info and traces it looks like the call buf.retain() at com.example.network.listener.netty.UdpRequestExecutor.channelRead(UdpRequestExecutor.java:71) was wrong. Anyway at the time of GC your buffer refcount was > 0. You should study examples and the matrix of responsibility to work with refcounts correctly.