Kafka Connect Out of Java heap space after enabling SSL - java

I have recently enabled SSL and tried to start Kafka connect in distributed mode.
When running
connect-distributed connect-distributed.properties
I get the following errors:
[2018-10-09 16:50:57,190] INFO Stopping task (io.confluent.connect.jdbc.sink.JdbcSinkTask:106)
[2018-10-09 16:50:55,471] ERROR WorkerSinkTask{id=sink-mariadb-test} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:177)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:344)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:305)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:560)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:496)
at org.apache.kafka.common.network.Selector.poll(Selector.java:425)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:271)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:230)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:314)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1218)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1181)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:444)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:317)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
and
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:694)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
at sun.nio.ch.IOUtil.read(IOUtil.java:195)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:104)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:117)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:344)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:305)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:560)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:496)
at org.apache.kafka.common.network.Selector.poll(Selector.java:425)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:271)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:230)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:314)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1218)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1181)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:444)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:317)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I have also tried to increase max and initial heap size by setting the KAFKA_HEAP_OPTS environment variable by running
KAFKA_HEAP_OPTS="-Xms4g -Xmx6g" connect-distributed connect-distributed.properties
but still doesn't work.
My questions are:
Can SSL authentication affect memory usage by any chance?
How can I fix the issue?
EDIT:
I have tried to disable SSL and everything is working without any problems.

I ran into this issue when enabling SASL_SSL in Kafka Connect :
[2018-10-12 12:33:36,426] ERROR WorkerSinkTask{id=test-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:172)
java.lang.OutOfMemoryError: Java heap space
Checking ConsumerConfig values showed me that my configuration was not applied :
[2018-10-12 12:33:35,573] INFO ConsumerConfig values:
...
security.protocol = PLAINTEXT
I found out that you have to prefix configs with producer. or consumer. in your properties file.
consumer.security.protocol=SASL_SSL

Related

Getting org.rocksdb.RocksDBException: bad entry in block

I'm using rocksDB to store data where the key is a string and the value is an integer. Recently my application threw the following exception while writing into rocks.
java.lang.Exception: org.rocksdb.RocksDBException: bad entry in block
at com.techspot.store.RocksStore.encodePacket(RocksStore.java:684) ~[techspot-encoder-dev.jar:?]
at com.techspot.store.workers.PacketEncoder.run(PacketEncoder.java:67) [techspot-encoder-dev.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_212]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
Caused by: org.rocksdb.RocksDBException: bad entry in block
at org.rocksdb.RocksDB.get(Native Method) ~[rocksdbjni-6.13.3.jar:?]
at org.rocksdb.RocksDB.get(RocksDB.java:1948) ~[rocksdbjni-6.13.3.jar:?]
at com.techspot.store.RocksStore.packetExists(RocksStore.java:402) ~[techspot-encoder-dev.jar:?]
at com.techspot.store.RocksStore.encodePacket(RocksStore.java:634) ~[techspot-encoder-dev.jar:?]
... 6 more
The error first started occurring when the number of entries was ~350 million and the database size was ~18GB. This issue is hard to reproduce, I tried to reproduce it by putting almost ~700 million entries but couldn't do it. I'm using RocksDB version 6.13.3 and using the following options for rocksDB:
Options options = new Options();
BlockBasedTableConfig blockBasedTableConfig = new BlockBasedTableConfig();
blockBasedTableConfig.setBlockSize(16 * 1024);// 16Kb
options.setWriteBufferSize(64 * 1024 * 1024);// 64MB
options.setMaxWriteBufferNumber(8);
options.setMinWriteBufferNumberToMerge(1);
options.setTableCacheNumshardbits(8);
options.setLevelZeroSlowdownWritesTrigger(1000);
options.setLevelZeroStopWritesTrigger(2000);
options.setLevelZeroFileNumCompactionTrigger(1);
options.setCompressionType(CompressionType.LZ4_COMPRESSION);
options.setTableFormatConfig(blockBasedTableConfig);
options.setCompactionStyle(CompactionStyle.UNIVERSAL);
options.setCreateIfMissing(Boolean.TRUE);
options.setEnablePipelinedWrite(true);
options.setIncreaseParallelism(8);
Does anyone have any idea what might be the cause for this exception?

Caused by: java.lang.UnsupportedOperationException: BigQuery source must be split before being read

I am trying to read from bigquery using Java BigqueryIO.read method. but getting below error.
public POutput expand(PBegin pBegin) {
final String queryOperation = "select query";
return pBegin
.apply(BigQueryIO.readTableRows().fromQuery(queryOperation));
}
2020-06-08 19:32:01.391 ISTError message from worker: java.io.IOException: Failed to start reading from source: org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter#77f0db34 org.apache.beam.runners.dataflow.worker.WorkerCustomSources$UnboundedReaderIterator.start(WorkerCustomSources.java:792) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1320) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:151) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1053) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:748) Caused by: java.lang.UnsupportedOperationException: BigQuery source must be split before being read org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.createReader(BigQuerySourceBase.java:173) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.advance(UnboundedReadFromBoundedSource.java:467) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.access$300(UnboundedReadFromBoundedSource.java:446) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.advance(UnboundedReadFromBoundedSource.java:298) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.start(UnboundedReadFromBoundedSource.java:291) org.apache.beam.runners.dataflow.worker.WorkerCustomSources$UnboundedReaderIterator.start(WorkerCustomSources.java:787) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1320) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:151) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1053) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:748)
I suppose that this issue might be connected with missing tempLocation pipeline execution parameter when you are using DataflowRunner for cloud execution.
According to the documentation:
If tempLocation is not specified and gcpTempLocation is, tempLocation will not be populated.
Since this is just my presumption, I'll also encourage you to inspect native Apache Beam runtime logs to expand the overall issue evidence, as long as Stackdriver logs don't reflect a full picture of the problem.
There was raised a separate Jira tracker thread BEAM-9043, indicating this vaguely outputted error description.
Feel free to append more certain information to you origin question for any further concern or essential updates.

Random Exception : Futures timed out after Exception in Spark Jobs

Getting the following error on running the spark Job on Spark 2.0.
The error is Random in nature & does not occur all the time.
Once the tasks are being created most of them are completed properly while a few gets hung & throws the following error after a while.
I have tried increasing the following properties spark.executor.heartbeatInterval & spark.network.timeout but of no use.
17/07/23 20:46:35 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(driver,[Lscala.Tuple2;#597e9d16,BlockManagerId(driver, 128.164.190.35, 38337))] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518)
at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547)
at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547)
at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1857)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:81)
... 14 more
Yes, the problem is indeed due to GC as it used to pause the tasks, changing the default GC to G1GC reduced the problem. Thanks
XX:+UseG1GC
https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html

Hive OOM error when running a query with tez as execution engine

when I am running the following query, I am getting the error below.
insert overwrite table mybug select row_number() over (order by clickstream_key) as key, clickstream_key as data from mytest;
It is launching around 2 mapper tasks and 240 reducer tasks and the job is going smooth and fast upto 239 tasks, it is taking 3 hrs to do do 4 task attempts and then the job is failing.
Table mytest contains 2 billion records with one column "clickstream_key".
2016-07-24 09:54:37,918 [ERROR] [TezChild] |tez.TezProcessor|: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3181)
at java.util.ArrayList.grow(ArrayList.java:261)
at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227)
at java.util.ArrayList.add(ArrayList.java:458)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRowNumber$RowNumberBuffer.incr(GenericUDAFRowNumber.java:73)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRowNumber$GenericUDAFRowNumberEvaluator.iterate(GenericUDAFRowNumber.java:102)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:185)
at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.evaluateWindowFunction(WindowingTableFunction.java:164)
at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.iterator(WindowingTableFunction.java:580)
at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:340)
at org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:290)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-07-24 09:54:37,920 [INFO] [TezChild] |task.TezTaskRunner|: Encounted an error while executing task: attempt_1469365079244_0001_1_01_000000_0
java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3181)
at java.util.ArrayList.grow(ArrayList.java:261)
at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227)
at java.util.ArrayList.add(ArrayList.java:458)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRowNumber$RowNumberBuffer.incr(GenericUDAFRowNumber.java:73)
I tried increasing data node heap, reducer heap, mapper heap, hive.tez.java.opts but none of them worked. Any leads on this would be highly appreciated. The error says java heap space, which java heap space does it mean ?

gRPC OOME and NPE in simple JMH benchmark

I tried gRPC, but gRPC use proto-buf immutable message object, I meet a lot OOM like
Exception in thread "grpc-default-executor-68" java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:204)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:262)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:157)
at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:93)
at io.grpc.netty.NettyWritableBufferAllocator.allocate(NettyWritableBufferAllocator.java:66)
at io.grpc.internal.MessageFramer.writeKnownLength(MessageFramer.java:182)
at io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:135)
at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:125)
at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:165)
at io.grpc.internal.AbstractServerStream.writeMessage(AbstractServerStream.java:108)
at io.grpc.internal.ServerImpl$ServerCallImpl.sendMessage(ServerImpl.java:496)
at io.grpc.stub.ServerCalls$ResponseObserver.onNext(ServerCalls.java:241)
at play.bench.BenchGRPC$CounterImpl$1.onNext(BenchGRPC.java:194)
at play.bench.BenchGRPC$CounterImpl$1.onNext(BenchGRPC.java:191)
at io.grpc.stub.ServerCalls$2$1.onMessage(ServerCalls.java:191)
at io.grpc.internal.ServerImpl$ServerCallImpl$ServerStreamListenerImpl.messageRead(ServerImpl.java:546)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1.run(ServerImpl.java:417)
at io.grpc.internal.SerializingExecutor$TaskRunner.run(SerializingExecutor.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
io.grpc.StatusRuntimeException: CANCELLED
at io.grpc.Status.asRuntimeException(Status.java:430)
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:266)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$3.run(ClientCallImpl.java:320)
at io.grpc.internal.SerializingExecutor$TaskRunner.run(SerializingExecutor.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I'm not sure is this caused by object creation, I gave 5G mem to this process, still OOM, needs some help.
EDIT
I put my bench, proto, dependencies and an example out to this gist, the problem is the memory goes very high, sooner or later will cause OOME, and there is a strange NPE
严重: Exception while executing runnable io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$2#312546d9
java.lang.NullPointerException
at io.netty.buffer.PoolChunk.initBufWithSubpage(PoolChunk.java:378)
at io.netty.buffer.PoolChunk.initBufWithSubpage(PoolChunk.java:369)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:194)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:262)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:157)
at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:93)
at io.grpc.netty.NettyWritableBufferAllocator.allocate(NettyWritableBufferAllocator.java:66)
at io.grpc.internal.MessageFramer.writeKnownLength(MessageFramer.java:182)
at io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:135)
at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:125)
at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:165)
at io.grpc.internal.AbstractServerStream.writeMessage(AbstractServerStream.java:108)
at io.grpc.internal.ServerImpl$ServerCallImpl.sendMessage(ServerImpl.java:496)
at io.grpc.stub.ServerCalls$ResponseObserver.onNext(ServerCalls.java:241)
at play.bench.BenchGRPCOOME$CounterImpl.inc(BenchGRPCOOME.java:150)
at play.bench.CounterServerGrpc$1.invoke(CounterServerGrpc.java:171)
at play.bench.CounterServerGrpc$1.invoke(CounterServerGrpc.java:166)
at io.grpc.stub.ServerCalls$1$1.onHalfClose(ServerCalls.java:154)
at io.grpc.internal.ServerImpl$ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerImpl.java:562)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$2.run(ServerImpl.java:432)
at io.grpc.internal.SerializingExecutor$TaskRunner.run(SerializingExecutor.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
The problem is that StreamObserver.onNext does not block, so there is no push-back when you write too much. There is an open issue. There needs to be a way for you to interact with flow control and be informed that you should slow your sending rate. For client-side, a workaround is to use ClientCall directly; you call Channel.newCall all then pay attention to isReady and onReady. For server-side there isn't an easy workaround.

Categories