Getting org.rocksdb.RocksDBException: bad entry in block - java

I'm using rocksDB to store data where the key is a string and the value is an integer. Recently my application threw the following exception while writing into rocks.
java.lang.Exception: org.rocksdb.RocksDBException: bad entry in block
at com.techspot.store.RocksStore.encodePacket(RocksStore.java:684) ~[techspot-encoder-dev.jar:?]
at com.techspot.store.workers.PacketEncoder.run(PacketEncoder.java:67) [techspot-encoder-dev.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_212]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
Caused by: org.rocksdb.RocksDBException: bad entry in block
at org.rocksdb.RocksDB.get(Native Method) ~[rocksdbjni-6.13.3.jar:?]
at org.rocksdb.RocksDB.get(RocksDB.java:1948) ~[rocksdbjni-6.13.3.jar:?]
at com.techspot.store.RocksStore.packetExists(RocksStore.java:402) ~[techspot-encoder-dev.jar:?]
at com.techspot.store.RocksStore.encodePacket(RocksStore.java:634) ~[techspot-encoder-dev.jar:?]
... 6 more
The error first started occurring when the number of entries was ~350 million and the database size was ~18GB. This issue is hard to reproduce, I tried to reproduce it by putting almost ~700 million entries but couldn't do it. I'm using RocksDB version 6.13.3 and using the following options for rocksDB:
Options options = new Options();
BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig();
blockBasedTableConfig.setBlockSize(16 * 1024);// 16Kb
options.setWriteBufferSize(64 * 1024 * 1024);// 64MB
options.setMaxWriteBufferNumber(8);
options.setMinWriteBufferNumberToMerge(1);
options.setTableCacheNumshardbits(8);
options.setLevelZeroSlowdownWritesTrigger(1000);
options.setLevelZeroStopWritesTrigger(2000);
options.setLevelZeroFileNumCompactionTrigger(1);
options.setCompressionType(CompressionType.LZ4_COMPRESSION);
options.setTableFormatConfig(blockBasedTableConfig);
options.setCompactionStyle(CompactionStyle.UNIVERSAL);
options.setCreateIfMissing(Boolean.TRUE);
options.setEnablePipelinedWrite(true);
options.setIncreaseParallelism(8);
Does anyone have any idea what might be the cause for this exception?

Related

Java multi threaded application - getting "Bad File Descriptor" exception on Hive intermittently

I know this kind of question have been asked previously, but I still don't get solution after reading their posts, so I decide to post this question again from here.
I Am working on Java multi-threaded application where I am trying to run HQL queries using JDBC on Hive environment. I have bunch of hive-sql queries and i am executing them on Hive in parallel with multiple threads and I am getting following exception when queries count more (for example, if i am running more than 100 queries). can some one please check this and help me on this?
2020-06-16 06:00:45,314 ERROR [main]: Terminal exception
java.lang.Exception: Map step agg_cas_auth_reinstate_derive failed.
at com.mine.idn.magellan.ParallelExecGraph.execute(ParallelExecGraph.java:198)
at com.mine.idn.magellan.WarehouseSession.executeMap(WarehouseSession.java:332)
at com.mine.idn.magellan.StandAloneEnv.execute(StandAloneEnv.java:872)
at com.mine.idn.magellan.StandAloneEnv.execute(StandAloneEnv.java:778)
at com.mine.idn.magellan.StandAloneEnv.executeAndExit(StandAloneEnv.java:642)
at com.mine.idn.magellan.StandAloneEnv.main(StandAloneEnv.java:77)
Caused by: java.sql.SQLException: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. java.io.IOException: Bad file descriptor
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.io.IOException: Bad file descriptor
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2850)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2685)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2591)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1077)
at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2007)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:479)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:469)
at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:190)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:601)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:599)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:599)
at org.apache.hadoop.mapred.JobClient.getJobInner(JobClient.java:609)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:639)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:559)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:201)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79)
Caused by: java.io.IOException: Bad file descriptor
at java.io.FileInputStream.close0(Native Method)
at java.io.FileInputStream.access$000(FileInputStream.java:49)
at java.io.FileInputStream$1.close(FileInputStream.java:336)
at java.io.FileDescriptor.closeAll(FileDescriptor.java:212)
at java.io.FileInputStream.close(FileInputStream.java:334)
at java.io.BufferedInputStream.close(BufferedInputStream.java:483)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2676)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2661)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2741)
... 22 more
at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:385)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at com.mine.idn.magellan.WarehouseSession.executeMapStep(WarehouseSession.java:797)
at com.mine.idn.magellan.WarehouseSession.access$000(WarehouseSession.java:23)
at com.mine.idn.magellan.WarehouseSession$ParallelExecResources.executeMapStep(WarehouseSession.java:91)
at com.mine.idn.magellan.ParallelExecGraph$Node.run(ParallelExecGraph.java:85)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
What i dont understand is, why hadoop framework throwing - Bad File Descriptor Exception? My Java code invoking Hadoop-Hive code and its throwing this exception.
Also one more thing is this issue is intermittent, not consistent. If i re-run the same application, most of the cases, it went through.
Thank you for

Caused by: java.lang.UnsupportedOperationException: BigQuery source must be split before being read

I am trying to read from bigquery using Java BigqueryIO.read method. but getting below error.
public POutput expand(PBegin pBegin) {
final String queryOperation = "select query";
return pBegin
.apply(BigQueryIO.readTableRows().fromQuery(queryOperation));
}
2020-06-08 19:32:01.391 ISTError message from worker: java.io.IOException: Failed to start reading from source: org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter#77f0db34 org.apache.beam.runners.dataflow.worker.WorkerCustomSources$UnboundedReaderIterator.start(WorkerCustomSources.java:792) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1320) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:151) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1053) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:748) Caused by: java.lang.UnsupportedOperationException: BigQuery source must be split before being read org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.createReader(BigQuerySourceBase.java:173) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.advance(UnboundedReadFromBoundedSource.java:467) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.access$300(UnboundedReadFromBoundedSource.java:446) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.advance(UnboundedReadFromBoundedSource.java:298) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.start(UnboundedReadFromBoundedSource.java:291) org.apache.beam.runners.dataflow.worker.WorkerCustomSources$UnboundedReaderIterator.start(WorkerCustomSources.java:787) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1320) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:151) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1053) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:748)
I suppose that this issue might be connected with missing tempLocation pipeline execution parameter when you are using DataflowRunner for cloud execution.
According to the documentation:
If tempLocation is not specified and gcpTempLocation is, tempLocation will not be populated.
Since this is just my presumption, I'll also encourage you to inspect native Apache Beam runtime logs to expand the overall issue evidence, as long as Stackdriver logs don't reflect a full picture of the problem.
There was raised a separate Jira tracker thread BEAM-9043, indicating this vaguely outputted error description.
Feel free to append more certain information to you origin question for any further concern or essential updates.

Kafka Connect Out of Java heap space after enabling SSL

I have recently enabled SSL and tried to start Kafka connect in distributed mode.
When running
connect-distributed connect-distributed.properties
I get the following errors:
[2018-10-09 16:50:57,190] INFO Stopping task (io.confluent.connect.jdbc.sink.JdbcSinkTask:106)
[2018-10-09 16:50:55,471] ERROR WorkerSinkTask{id=sink-mariadb-test} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:177)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:344)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:305)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:560)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:496)
at org.apache.kafka.common.network.Selector.poll(Selector.java:425)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:271)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:230)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:314)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1218)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1181)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:444)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:317)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
and
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:694)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
at sun.nio.ch.IOUtil.read(IOUtil.java:195)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:104)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:117)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:344)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:305)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:560)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:496)
at org.apache.kafka.common.network.Selector.poll(Selector.java:425)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:271)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:230)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:314)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1218)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1181)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:444)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:317)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I have also tried to increase max and initial heap size by setting the KAFKA_HEAP_OPTS environment variable by running
KAFKA_HEAP_OPTS="-Xms4g -Xmx6g" connect-distributed connect-distributed.properties
but still doesn't work.
My questions are:
Can SSL authentication affect memory usage by any chance?
How can I fix the issue?
EDIT:
I have tried to disable SSL and everything is working without any problems.
I ran into this issue when enabling SASL_SSL in Kafka Connect :
[2018-10-12 12:33:36,426] ERROR WorkerSinkTask{id=test-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:172)
java.lang.OutOfMemoryError: Java heap space
Checking ConsumerConfig values showed me that my configuration was not applied :
[2018-10-12 12:33:35,573] INFO ConsumerConfig values:
...
security.protocol = PLAINTEXT
I found out that you have to prefix configs with producer. or consumer. in your properties file.
consumer.security.protocol=SASL_SSL

NullPointerException in Camus Job [EtlMultiOutputRecordWriter] - ExceptionWritable

I am very new to Camus and Hadoop, and am running into an exception error. I am trying to write some avro files to a hdfs, and keep getting the following error block:
[EtlMultiOutputRecordWriter] - ExceptionWritable key: topic=_schemas partition=0leaderId=0 server= service= beginOffset=0 offset=0 msgSize=1024 server= checksum=0 time=1450371931447 value: java.lang.Exception
at com.linkedin.camus.etl.kafka.common.KafkaReader.getNext(KafkaReader.java:108)
at com.linkedin.camus.etl.kafka.mapred.EtlRecordReader.nextKeyValue(EtlRecordReader.java:232)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
... 14 more
I looked up line 108 in com.linkedin.camus.etl.kafka.common.KafkaReader.getNext and found it to be this: MessageAndOffset msgAndOffset = messageIter.next();.
I am using io.confluent.camus.etl.kafka.coders.AvroMessageDecoder for my decoder and com.linkedin.camus.example.DummySchemaRegistry for my coder.
At the end of the logs I get another line indicating an error from one of the hdfs files: Error from file [hdfs://localhost:9000/user/username/exec/2015-12-17-17-05-25/errors-m-00000]. The error-m-00000 file contains a somewhat readable beginning, but then changes to an undecipherable string:
SEQ*com.linkedin.camus.etl.kafka.common.EtlKey5com.linkedin.camus.etl.kafka.common.ExceptionWritable*org.apache.hadoop.io.compress.DefaultCodec|Ò ∫±ß˝}pºHí$ò¸·:0schemasQ∞∆øÿxúïîÀN√0E7l‡+∫»¢lFMõ>á*êxU®™ËzÍmàc[ÆÕ„XÚÕÿqZ%#[ÿD±gÓô…¯∆üGœ¯Ç¿Q,·Úçë2ô'«hZL¿3ëSöXÿ5ê·ê„Sé‡ÇÖpÎS¬î4,…LËÕ¥Î{û}wFßáâ*M)>%&uZÑCfi“˚#rKÌÔ¡flÌu^Í%†B∂"Xa*•⁄0ÔQÕpùGzùidy&ñªkT…śԈ≥-#0>›…∆RG∫.ˇÅ¨«JÚ®sÃ≥Ö¡\£Rîfi˚ßéT≥D#%T8ãW®ÚµÌ∫4N˙©W∫©mst√—Ô嶥óhÓ$C~#S+Ñâ{ãÇfl¡ßí⁄L´ÏíÙºÙΩ5wfÃjM¬∏_Äò5RØ£
Ë"Eeúÿëx{ÆÏ«{XW÷XM€O¨-C#É¡Òl•ù9§‰õö2ó:wɲ%Œ-N∫ˇbFXˆ∑:àá5fyQÑ‘ö™:roõ1⁄5•≠≈˚yM0±ú?»ÃW◊.h≈I´êöNæ
[û3
At the end it appears that a hadoop job has run, but a commit never takes place, based of the timing report:
Job time (seconds):
pre setup 1.0 (11%)
get splits 1.0 (11%)
hadoop job 4.0 (44%)
commit 0.0 (0%)
Total: 0 minutes 9 seconds
Any help or an idea of where to look to resolve this would be greatly appreciated. Thank you.

gRPC OOME and NPE in simple JMH benchmark

I tried gRPC, but gRPC use proto-buf immutable message object, I meet a lot OOM like
Exception in thread "grpc-default-executor-68" java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:204)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:262)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:157)
at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:93)
at io.grpc.netty.NettyWritableBufferAllocator.allocate(NettyWritableBufferAllocator.java:66)
at io.grpc.internal.MessageFramer.writeKnownLength(MessageFramer.java:182)
at io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:135)
at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:125)
at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:165)
at io.grpc.internal.AbstractServerStream.writeMessage(AbstractServerStream.java:108)
at io.grpc.internal.ServerImpl$ServerCallImpl.sendMessage(ServerImpl.java:496)
at io.grpc.stub.ServerCalls$ResponseObserver.onNext(ServerCalls.java:241)
at play.bench.BenchGRPC$CounterImpl$1.onNext(BenchGRPC.java:194)
at play.bench.BenchGRPC$CounterImpl$1.onNext(BenchGRPC.java:191)
at io.grpc.stub.ServerCalls$2$1.onMessage(ServerCalls.java:191)
at io.grpc.internal.ServerImpl$ServerCallImpl$ServerStreamListenerImpl.messageRead(ServerImpl.java:546)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1.run(ServerImpl.java:417)
at io.grpc.internal.SerializingExecutor$TaskRunner.run(SerializingExecutor.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
io.grpc.StatusRuntimeException: CANCELLED
at io.grpc.Status.asRuntimeException(Status.java:430)
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:266)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$3.run(ClientCallImpl.java:320)
at io.grpc.internal.SerializingExecutor$TaskRunner.run(SerializingExecutor.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I'm not sure is this caused by object creation, I gave 5G mem to this process, still OOM, needs some help.
EDIT
I put my bench, proto, dependencies and an example out to this gist, the problem is the memory goes very high, sooner or later will cause OOME, and there is a strange NPE
严重: Exception while executing runnable io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$2#312546d9
java.lang.NullPointerException
at io.netty.buffer.PoolChunk.initBufWithSubpage(PoolChunk.java:378)
at io.netty.buffer.PoolChunk.initBufWithSubpage(PoolChunk.java:369)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:194)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:262)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:157)
at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:93)
at io.grpc.netty.NettyWritableBufferAllocator.allocate(NettyWritableBufferAllocator.java:66)
at io.grpc.internal.MessageFramer.writeKnownLength(MessageFramer.java:182)
at io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:135)
at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:125)
at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:165)
at io.grpc.internal.AbstractServerStream.writeMessage(AbstractServerStream.java:108)
at io.grpc.internal.ServerImpl$ServerCallImpl.sendMessage(ServerImpl.java:496)
at io.grpc.stub.ServerCalls$ResponseObserver.onNext(ServerCalls.java:241)
at play.bench.BenchGRPCOOME$CounterImpl.inc(BenchGRPCOOME.java:150)
at play.bench.CounterServerGrpc$1.invoke(CounterServerGrpc.java:171)
at play.bench.CounterServerGrpc$1.invoke(CounterServerGrpc.java:166)
at io.grpc.stub.ServerCalls$1$1.onHalfClose(ServerCalls.java:154)
at io.grpc.internal.ServerImpl$ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerImpl.java:562)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$2.run(ServerImpl.java:432)
at io.grpc.internal.SerializingExecutor$TaskRunner.run(SerializingExecutor.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
The problem is that StreamObserver.onNext does not block, so there is no push-back when you write too much. There is an open issue. There needs to be a way for you to interact with flow control and be informed that you should slow your sending rate. For client-side, a workaround is to use ClientCall directly; you call Channel.newCall all then pay attention to isReady and onReady. For server-side there isn't an easy workaround.

Categories