We are using log4j2 for logging. After doing initial benchmark I have concluded that for same QPS, logging exception is 80 times slower than logging normal message.
Here is a sample error log message.
2022-08-25 11:09:14,699 - [pool-2-thread-1][{id=eb9bcf4f-1dcc-4cd7-8588-fd7916d623b8, path=/hellowworld}] - [Test2] ERROR - error wile doing something important java.lang.RuntimeException: exception
at Test2.recursiveCount(Test2.java:140) ~[logging.jar:?]
at Test2.recursiveCount(Test2.java:142) ~[logging.jar:?]
at Test2.recursiveCount(Test2.java:142) ~[logging.jar:?]
at Test2.access$000(Test2.java:17) ~[logging.jar:?]
at Test2$LoggerRunnable.run(Test2.java:75) ~[logging.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_202]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_202]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_202]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_202]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
Here is a sample log message
2022-08-25 18:12:02,125 - [pool-2-thread-1][{id=90e8ebda-e02a-4c48-bc37-560190bda40a, path=/hellowworld}] - [Test2] INFO - adding log with number=ok uuid=1661431322125
It could be attributed to
Note: Logging exceptions and stack traces will create temporary objects with any layout. (However, Layouts will only create these temporary objects when an exception actually occurs.) We haven't figured out a way to log exceptions and stack traces without creating temporary objects. That is unfortunate, but you probably still want to log them when they happen.
source log4j2 documentation
Are there any config change that can be done to reduce the overhead of exception logging? Like
For similar stacktrace errors only log few sample stacktrace and not all the stack traces
limiting the deapth of stacktrace?
anything else?
Related
Like in this picture
I know that both of them works fine but I just wanted to know how are they different from each other? PS: I am a beginner.
A LogEvent can contain both a message and an exception. If you use the first form:
LOGGER.error(exception.getMessage());
only the error message will be recorded and you'll have a similar output in your logs (depends upon the layout):
[ERROR] - Exception message.
If you use the second form:
LOGGER.error(exception.getMessage(), exception);
you'll have both a message and an exception. This typically outputs the stack trace in the logs:
[ERROR] - Exception message.
java.lang.RuntimeException: Exception message.
at pl.copernik.log4j.Log4j2Test.run(Log4j2Test.java:19) [classes/:?]
at pl.copernik.log4j.Log4j2Test.main(Log4j2Test.java:15) [classes/:?]
Since you can always configure Log4j 2.x to suppress stack traces from the output (see alwaysWriteExceptions in the PatternLayout documentation), the second form is always better, as it provides more information.
Remark: Since the stack trace always prints the error's message, I'd suggest a third form:
LOGGER.error("An unexpected condition occurred while performing foo: {}",
exception.getMessage(), exception);
This way you can explain the context, when the exception occurred:
[ERROR] - An unexpected condition occurred while performing foo: Exception message.
java.lang.RuntimeException: Exception message.
at pl.copernik.log4j.Log4j2Test.run(Log4j2Test.java:19) [classes/:?]
at pl.copernik.log4j.Log4j2Test.main(Log4j2Test.java:15) [classes/:?]
I know this kind of question have been asked previously, but I still don't get solution after reading their posts, so I decide to post this question again from here.
I Am working on Java multi-threaded application where I am trying to run HQL queries using JDBC on Hive environment. I have bunch of hive-sql queries and i am executing them on Hive in parallel with multiple threads and I am getting following exception when queries count more (for example, if i am running more than 100 queries). can some one please check this and help me on this?
2020-06-16 06:00:45,314 ERROR [main]: Terminal exception
java.lang.Exception: Map step agg_cas_auth_reinstate_derive failed.
at com.mine.idn.magellan.ParallelExecGraph.execute(ParallelExecGraph.java:198)
at com.mine.idn.magellan.WarehouseSession.executeMap(WarehouseSession.java:332)
at com.mine.idn.magellan.StandAloneEnv.execute(StandAloneEnv.java:872)
at com.mine.idn.magellan.StandAloneEnv.execute(StandAloneEnv.java:778)
at com.mine.idn.magellan.StandAloneEnv.executeAndExit(StandAloneEnv.java:642)
at com.mine.idn.magellan.StandAloneEnv.main(StandAloneEnv.java:77)
Caused by: java.sql.SQLException: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. java.io.IOException: Bad file descriptor
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.io.IOException: Bad file descriptor
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2850)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2685)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2591)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1077)
at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2007)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:479)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:469)
at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:190)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:601)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:599)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:599)
at org.apache.hadoop.mapred.JobClient.getJobInner(JobClient.java:609)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:639)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:559)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:201)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79)
Caused by: java.io.IOException: Bad file descriptor
at java.io.FileInputStream.close0(Native Method)
at java.io.FileInputStream.access$000(FileInputStream.java:49)
at java.io.FileInputStream$1.close(FileInputStream.java:336)
at java.io.FileDescriptor.closeAll(FileDescriptor.java:212)
at java.io.FileInputStream.close(FileInputStream.java:334)
at java.io.BufferedInputStream.close(BufferedInputStream.java:483)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2676)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2661)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2741)
... 22 more
at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:385)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at com.mine.idn.magellan.WarehouseSession.executeMapStep(WarehouseSession.java:797)
at com.mine.idn.magellan.WarehouseSession.access$000(WarehouseSession.java:23)
at com.mine.idn.magellan.WarehouseSession$ParallelExecResources.executeMapStep(WarehouseSession.java:91)
at com.mine.idn.magellan.ParallelExecGraph$Node.run(ParallelExecGraph.java:85)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
What i dont understand is, why hadoop framework throwing - Bad File Descriptor Exception? My Java code invoking Hadoop-Hive code and its throwing this exception.
Also one more thing is this issue is intermittent, not consistent. If i re-run the same application, most of the cases, it went through.
Thank you for
I am trying to read from bigquery using Java BigqueryIO.read method. but getting below error.
public POutput expand(PBegin pBegin) {
final String queryOperation = "select query";
return pBegin
.apply(BigQueryIO.readTableRows().fromQuery(queryOperation));
}
2020-06-08 19:32:01.391 ISTError message from worker: java.io.IOException: Failed to start reading from source: org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter#77f0db34 org.apache.beam.runners.dataflow.worker.WorkerCustomSources$UnboundedReaderIterator.start(WorkerCustomSources.java:792) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1320) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:151) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1053) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:748) Caused by: java.lang.UnsupportedOperationException: BigQuery source must be split before being read org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.createReader(BigQuerySourceBase.java:173) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.advance(UnboundedReadFromBoundedSource.java:467) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$ResidualSource.access$300(UnboundedReadFromBoundedSource.java:446) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.advance(UnboundedReadFromBoundedSource.java:298) org.apache.beam.runners.core.construction.UnboundedReadFromBoundedSource$BoundedToUnboundedSourceAdapter$Reader.start(UnboundedReadFromBoundedSource.java:291) org.apache.beam.runners.dataflow.worker.WorkerCustomSources$UnboundedReaderIterator.start(WorkerCustomSources.java:787) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1320) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:151) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1053) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:748)
I suppose that this issue might be connected with missing tempLocation pipeline execution parameter when you are using DataflowRunner for cloud execution.
According to the documentation:
If tempLocation is not specified and gcpTempLocation is, tempLocation will not be populated.
Since this is just my presumption, I'll also encourage you to inspect native Apache Beam runtime logs to expand the overall issue evidence, as long as Stackdriver logs don't reflect a full picture of the problem.
There was raised a separate Jira tracker thread BEAM-9043, indicating this vaguely outputted error description.
Feel free to append more certain information to you origin question for any further concern or essential updates.
I wrote a mapreduce job to scan an hbase table for a certain time range to count certain elements we need for analysis.
Mappers in the MR job keeps failing but I don't know why. Seems like each time I run the job, a different number of mappers fail. The YARN log (see below) from Cloudera manager isn't helpful in pointing what the problem is, although, someone said I might be running out of memory.
It seems to retry multiple times but each time it fails. What do I need to do to make it stop failing or how can I log things to help me better determine what is happening?
Below is a log from YARN for one of the mappers that failed.
Error: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Failed after attempts=36, exceptions: Thu Jun 15 16:26:57 PDT 2017,
null, java.net.SocketTimeoutException: callTimeout=60000,
callDuration=60301: row '152_p3401.db161139.sjc102.dbi_1496271480' on
table 'dbi_based_data' at
region=dbi_based_data,151_p3413.db162024.iad4.dbi_1476974340,1486675565213.d83250d0682e648d165872afe5abd60e., hostname=hslave35118.ams9.mysecretdomain.com,60020,1483570489305,
seqNum=19308931 at
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:207)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at
org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
at
org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403)
at
org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)
at
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:236)
at
org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:147)
at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue(TableInputFormatBase.java:216)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused
by: java.net.SocketTimeoutException: callTimeout=60000,
callDuration=60301: row '152_p3401.db161139.sjc102.dbi_1496271480' on
table 'dbi_based_data' at
region=dbi_based_data,151_p3413.db162024.iad4.dbi_1476974340,1486675565213.d83250d0682e648d165872afe5abd60e., hostname=hslave35118.ams9.mysecretdomain.com,60020,1483570489305,
seqNum=19308931 at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
at
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) Caused by:
java.io.IOException: Call to
hslave35118.ams9.mysecretdomain.com/10.216.35.118:60020 failed on
local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException:
Call id=12, waitTime=60001, operationTimeout=60000 expired. at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:291)
at
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226)
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:34094)
at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:219)
at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:64)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:360)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:334)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
... 4 more Caused by:
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=12,
waitTime=60001, operationTimeout=60000 expired. at
org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:73) at
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1246)
... 13 more
So it looks like for my case I needed to extend the timeout setting. In my Java program I had to add the following lines to make the exception go away:
conf.set("hbase.rpc.timeout","90000");
conf.set("hbase.client.scanner.timeout.period","90000");
The answer was found on this link on Cloudera's site
I tried gRPC, but gRPC use proto-buf immutable message object, I meet a lot OOM like
Exception in thread "grpc-default-executor-68" java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:204)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:262)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:157)
at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:93)
at io.grpc.netty.NettyWritableBufferAllocator.allocate(NettyWritableBufferAllocator.java:66)
at io.grpc.internal.MessageFramer.writeKnownLength(MessageFramer.java:182)
at io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:135)
at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:125)
at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:165)
at io.grpc.internal.AbstractServerStream.writeMessage(AbstractServerStream.java:108)
at io.grpc.internal.ServerImpl$ServerCallImpl.sendMessage(ServerImpl.java:496)
at io.grpc.stub.ServerCalls$ResponseObserver.onNext(ServerCalls.java:241)
at play.bench.BenchGRPC$CounterImpl$1.onNext(BenchGRPC.java:194)
at play.bench.BenchGRPC$CounterImpl$1.onNext(BenchGRPC.java:191)
at io.grpc.stub.ServerCalls$2$1.onMessage(ServerCalls.java:191)
at io.grpc.internal.ServerImpl$ServerCallImpl$ServerStreamListenerImpl.messageRead(ServerImpl.java:546)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1.run(ServerImpl.java:417)
at io.grpc.internal.SerializingExecutor$TaskRunner.run(SerializingExecutor.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
io.grpc.StatusRuntimeException: CANCELLED
at io.grpc.Status.asRuntimeException(Status.java:430)
at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:266)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$3.run(ClientCallImpl.java:320)
at io.grpc.internal.SerializingExecutor$TaskRunner.run(SerializingExecutor.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I'm not sure is this caused by object creation, I gave 5G mem to this process, still OOM, needs some help.
EDIT
I put my bench, proto, dependencies and an example out to this gist, the problem is the memory goes very high, sooner or later will cause OOME, and there is a strange NPE
严重: Exception while executing runnable io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$2#312546d9
java.lang.NullPointerException
at io.netty.buffer.PoolChunk.initBufWithSubpage(PoolChunk.java:378)
at io.netty.buffer.PoolChunk.initBufWithSubpage(PoolChunk.java:369)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:194)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:262)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:157)
at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:93)
at io.grpc.netty.NettyWritableBufferAllocator.allocate(NettyWritableBufferAllocator.java:66)
at io.grpc.internal.MessageFramer.writeKnownLength(MessageFramer.java:182)
at io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:135)
at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:125)
at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:165)
at io.grpc.internal.AbstractServerStream.writeMessage(AbstractServerStream.java:108)
at io.grpc.internal.ServerImpl$ServerCallImpl.sendMessage(ServerImpl.java:496)
at io.grpc.stub.ServerCalls$ResponseObserver.onNext(ServerCalls.java:241)
at play.bench.BenchGRPCOOME$CounterImpl.inc(BenchGRPCOOME.java:150)
at play.bench.CounterServerGrpc$1.invoke(CounterServerGrpc.java:171)
at play.bench.CounterServerGrpc$1.invoke(CounterServerGrpc.java:166)
at io.grpc.stub.ServerCalls$1$1.onHalfClose(ServerCalls.java:154)
at io.grpc.internal.ServerImpl$ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerImpl.java:562)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$2.run(ServerImpl.java:432)
at io.grpc.internal.SerializingExecutor$TaskRunner.run(SerializingExecutor.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
The problem is that StreamObserver.onNext does not block, so there is no push-back when you write too much. There is an open issue. There needs to be a way for you to interact with flow control and be informed that you should slow your sending rate. For client-side, a workaround is to use ClientCall directly; you call Channel.newCall all then pay attention to isReady and onReady. For server-side there isn't an easy workaround.