jmeter download 1.5g file out of memory exception - java

I'M running jmx from command line
JVM_ARGS="-Xms2048m -Xmx4096m -XX:4096ize=4096m -XX:MaxNewSize=4096m" && export JVM_ARGS && ./jmeter.sh -n -t ./jmeter-ec2.jmx -l ./scriptresults.jtl
but on some point I got out of memory error , after going to jmeter.log
I found this error
ERROR o.a.j.JMeter: Uncaught exception: java.lang.OutOfMemoryError:
Java heap space at java.util.Arrays.copyOf(Arrays.java:3236)
~[?:1.8.0_91] at
java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
~[?:1.8.0_91] at
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
~[?:1.8.0_91] at
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
~[?:1.8.0_91] at
org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.readResponse(HTTPSamplerBase.java:1833)
~[ApacheJMeter_http.jar:3.3 r1808647] at
org.apache.jmeter.protocol.http.sampler.HTTPAbstractImpl.readResponse(HTTPAbstractImpl.java:440)
~[ApacheJMeter_http.jar:3.3 r1808647] at
org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl.sample(HTTPHC4Impl.java:474)
~[ApacheJMeter_http.jar:3.3 r1808647] at
org.apache.jmeter.protocol.http.sampler.HTTPSamplerProxy.sample(HTTPSamplerProxy.java:74)
~[ApacheJMeter_http.jar:3.3 r1808647] at
org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1189)
~[ApacheJMeter_http.jar:3.3 r1808647] at
org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1178)
~[ApacheJMeter_http.jar:3.3 r1808647] at
org.apache.jmeter.threads.JMeterThread.executeSamplePackage(JMeterThread.java:498)
~[ApacheJMeter_core.jar:3.3 r1808647] at
org.apache.jmeter.threads.JMeterThread.processSampler(JMeterThread.java:424)
~[ApacheJMeter_core.jar:3.3 r1808647] at
org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:255)
~[ApacheJMeter_core.jar:3.3 r1808647] at
java.lang.Thread.run(Thread.java:745) [?:1.8.0_91] 2018-01-26
02:03:55,731 INFO o.a.j.e.StandardJMeterEngine: Notifying test
listeners of end of test 2018-01-26 02:03:55,732 INFO
o.a.j.r.Summariser: summary = 0 in 00:00:00 = ******/s Avg: 0
Min: 9223372036854775807 Max: -9223372036854775808 Err: 0 (0.00%)
what I"M doing wrong here ? I cant solve it:(

Your JVM arguments are wrong, just keep:
-Xms2048m -Xmx4096m
You don't tell with how much threads this occurs nor if you're running in GUI or NON GUI mode, so:
Don't run in GUI mode, it's an anti-pattern
Ensure you have enough memory for your threads
Finally you can reduce the memory impact of big response by adapting this in user.properties:
httpsampler.max_bytes_to_store_per_request
And another option is to only compute HASH from your response by setting this in http://jmeter.apache.org/usermanual/component_reference.html#HTTP_Request:

Well, given you have 1.5 GB file you will be able to have not more than 3 virtual users which doesn't look like a "load test" to me.
If you are not interested in downloaded file's content and just want to stress your server you can consider switching to JSR223 Sampler which will send request and discard the response data using underlying Apache HttpComponents libraries methods, the relevant Groovy code would be something like:
import org.apache.http.client.methods.HttpGet
import org.apache.http.impl.client.HttpClientBuilder
import org.apache.http.util.EntityUtils
def client = HttpClientBuilder.create().build()
def get = new HttpGet('http://example.com')
def response = client.execute(get)
EntityUtils.consume(response.getEntity())
References:
HttpClient Tutorial
HttpClient Quick Start
Apache Groovy - Why and How You Should Use It

Related

Unable to load 25GB dataset in PySpark local mode with 56GB RAM free

I am having trouble loading and processing a 25GB Parquet dataset (of stackoverflow.com posts) on a single beefy machine in local mode with 12 cores/64GB of RAM.
I have more memory on my machine that is free and allocated to pyspark than the size of a Parquet dataset (let alone two columns of the dataset), and yet I am unable to run any operations on the DataFrame once I load it. This is confusing, and I can't figure out what to do.
Specifically, I have a Parquet dataset that is 25GB:
$ du -sh data/stackoverflow/parquet/Posts.df.parquet
25G data/stackoverflow/parquet/Posts.df.parquet
I have a machine with 56GB of free RAM:
$ free -h
total used free shared buff/cache
available
Mem: 62G 4.7G 56G 23M 1.7G
57G
Swap: 63G 0B 63G
I have configured PySpark to use 50GB of RAM (have tried adusting maxResultSize to no effect).
My configuration looks like this:
$ cat ~/spark/conf/spark-defaults.conf
spark.io.compression.codec org.apache.spark.io.SnappyCompressionCodec
spark.driver.memory 50g
spark.jars ...
spark.executor.cores 12
spark.driver.maxResultSize 20g
My environment looks like this:
$ cat ~/spark/conf/spark-env.sh
PYSPARK_PYTHON=python3
PYSPARK_DRIVER_PYTHON=python3
SPARK_WORKER_DIR=/nvm/spark/work
SPARK_LOCAL_DIRS=/nvm/spark/local
SPARK_WORKER_MEMORY=50g
SPARK_WORKER_CORES=12
I load the data like this:
$ pyspark
>>> posts = spark.read.parquet('data/stackoverflow/parquet/Posts.df.parquet')
It loads ok, but any operation - including if I run a limit(10) on the DataFrame first - results in an out of heap space error.
>>> posts.limit(10)\
.select('_ParentId','_Body')\
.filter(posts._ParentId == 9915705)\
.show()
[Stage 1:> (0 + 12) / 195]19/06/30 17:26:13 ERROR Executor: Exception in task 7.0 in stage 1.0 (TID 8)
java.lang.OutOfMemoryError: Java heap space
19/06/30 17:26:13 ERROR Executor: Exception in task 3.0 in stage 1.0 (TID 4)
java.lang.OutOfMemoryError: Java heap space
19/06/30 17:26:13 ERROR Executor: Exception in task 5.0 in stage 1.0 (TID 6)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.parquet.bytes.HeapByteBufferAllocator.allocate(HeapByteBufferAllocator.java:32)
at org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:1166)
at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:805)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:301)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:256)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:159)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19/06/30 17:26:13 ERROR Executor: Exception in task 10.0 in stage 1.0 (TID 11)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.parquet.bytes.HeapByteBufferAllocator.allocate(HeapByteBufferAllocator.java:32)
at org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:1166)
at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:805)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:301)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:256)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:159)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19/06/30 17:26:13 ERROR Executor: Exception in task 6.0 in stage 1.0 (TID 7)
java.lang.OutOfMemoryError: Java heap space
19/06/30 17:26:13 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 7,5,main]
java.lang.OutOfMemoryError: Java heap space
19/06/30 17:26:13 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 11,5,main]
java.lang.OutOfMemoryError: Java heap space
...
The following will run, suggesting the problem is the _Body field (obviously the largest):
>>> posts.limit(10).select('_Id').show()
+---+
|_Id|
+---+
| 4|
| 6|
| 7|
| 9|
| 11|
| 12|
| 13|
| 14|
| 16|
| 17|
+---+
What am I to do? I could use EMR, but I would like to be able to load this dataset locally and that seems an entirely reasonable thing to be able to do in this situation.
The default memory fraction for Spark's storage and computation is 0.6. Under your config it will be 0.6 * 50GB = 30GB. But the representation of data in memory may consume more space than the serialized disk version.
Please check the section of Memory Management to get more details.
You will need to set the spark memory config while running the pyspark command:
pyspark --conf spark.driver.memory=50g --conf spark.executor.pyspark.memory=50g
Check this doc for the config to set.
You might also need to figure out the number of executors you need based on your hardware.

Error : java.lang.OutOfMemoryError: unable to create new native thread : gemfire

Please before marking this duplicate read this : I have gone through all the answers provided for this error and nothing helped in my scenario.
I am doing a server migration where the same thing works well in 32 bit and 64 bit runs out of memory.
I have a windows service which internally points to .exe that spawns java process : I have made all the possible memory improvements in the config file of my .exe :Below:
I am not sure what different behavior is causing this out of memory for 64 bit server.(my java version is 1.8.xx)
#Java Additional Parameters
wrapper.java.additional.1=-XX:+UseConcMarkSweepGC
wrapper.java.additional.2=-XX:+UseParNewGC
wrapper.java.additional.3=-XX:ParallelGCThreads=8
wrapper.java.additional.4=-verbose:gc
# wrapper.java.additional.!!! should be sequence !!!=-Xloggc:D:\apps\Logs\gc.log
# wrapper.java.additional.!!! should be sequence !!!=-XX:+PrintGCDetails
# wrapper.java.additional.!!! should be sequence !!!=-XX:+PrintGCTimeStamps
wrapper.java.additional.5=-XX:MaxDirectMemorySize=128m
wrapper.java.additional.6=-XX:+HeapDumpOnOutOfMemoryError
wrapper.java.additional.7=-Dcom.sun.management.jmxremote.port=34001
wrapper.java.additional.8=-Dcom.sun.management.jmxremote.ssl=false
wrapper.java.additional.9=-Dcom.sun.management.jmxremote.authenticate=false
wrapper.java.additional.10=-XX:CMSInitiatingOccupancyFraction=55
wrapper.java.additional.11=-XX:NewSize=474m
wrapper.java.additional.12=-XX:MaxNewSize=474m
#wrapper.java.additional.13=-XX:PermSize=128m
#wrapper.java.additional.14=-XX:MaxPermSize=128m
wrapper.java.additional.15=-Xss128k
wrapper.java.additional.16=-XX:+CMSIncrementalMode
wrapper.java.additional.17=-XX:+UseCompressedOops
# Initial Java Heap Size (in MB)
wrapper.java.initmemory=1638
# Maximum Java Heap Size (in MB)
wrapper.java.maxmemory=1638
Still i am ending up to have :
[severe 2016/10/24 06:27:46.192 java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at com.gemstone.gemfire.internal.SocketCreator.asyncClose(SocketCreator.java:688)
Reading done for the concept here :
Error reading
I am not much into Java things but tried all the things from my side , any help on this will be highly appreciated , i spend huge amount of time on this but not able to reach to any conclusion.
***********Update***************
So basically could figure out that this problem was coming due to excessive creation of thread from Gemfire which exceeds the threshold ~800 threads for Gemfire Java Process.
Here Jconsole tool helped to calculate the thread count , i could see around 200-300 threads from different pool getting created with no purpose apart from usual threads and they have discription as :
Name: pool-9-thread-1
State: WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#163b285
Total blocked: 0 Total waited: 2
Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(Unknown Source)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source)
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)
I'll add more details if i can find more on this !
*******Update 2 : ************
I could manage to see all the threads created by gemfire using Jconsole:
And this number keeps on increasing and after certain point of time i am seeing the OOM issue.Is there any way i can stop this unnecessary threads creation and memory conumption !

Apache Spark out of Java heap space: where does it happen?

I have a Java memory issue with Spark. The same application working on my 8GB Mac crashes on my 72GB Ubuntu server...
I have changed things in the conf file, but it looks like Spark does not care, so I wonder if my issues are with the driver or executor.
I set:
spark.driver.memory 20g
spark.executor.memory 20g
And, whatever I do, the crash is always at the same spot in the app, which makes me think that it is a driver problem.
The exception I get is:
16/07/13 20:36:30 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 208, micha.nc.rr.com): java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57)
at java.nio.CharBuffer.allocate(CharBuffer.java:335)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:810)
at org.apache.hadoop.io.Text.decode(Text.java:412)
at org.apache.hadoop.io.Text.decode(Text.java:389)
at org.apache.hadoop.io.Text.toString(Text.java:280)
at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$23.apply(RDD.scala:1135)
at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$23.apply(RDD.scala:1135)
at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136)
at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Any hint? Thanks
Update:
I have set a small memory "dumper" in my app. At the beginning, it says:
** Free ......... 1,413,566
** Allocated .... 1,705,984
** Max .......... 16,495,104
**> Total free ... 16,202,686
Just before the crash, it says:
** Free ......... 1,461,633
** Allocated .... 1,786,880
** Max .......... 16,495,104
**> Total free ... 16,169,857
So for some reason, I have not been able to make the configuration file read by Spark on the server side, but modifying my code to:
SparkConf conf = new SparkConf()
.setAppName("app")
.set("spark.executor.memory", "4g")
.setMaster("spark://10.0.100.120:7077");
(Thanks to all the people who voted the question down, it is really motivating to come back here and post a solution).

GC overhead while running pig job, after hadoop job ends

I'm running a very simple pig script (pig 0.14, Hadoop 2.4) :
customers = load '/some/hdfs/path' using SomeUDFLoader();
customers2 = foreach (group customers by customer_id) generate FLATTEN(group) as customer_id, MIN(dw_customer.date) as date;
store customers2 into '/hdfs/output' using PigStorage(',');
This launches a map-reduce job of ~60000 mappers, and 999 reducers.
After the map-reduce job has finished it's work ( I know becuase the output has been written, and the job manager says the job has succeeded ), There is a long pause and I get the following error in the pig output :
2015-11-24 11:45:29,394 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at *********
2015-11-24 11:45:29,403 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-11-24 11:46:03,533 [Service Thread] INFO org.apache.pig.impl.util.SpillableMemoryManager - first memory handler call- Usage threshold init = 698875904(682496K) used = 520031456(507843K) committed = 698875904(682496K) max = 698875904(682496K)
2015-11-24 11:46:04,473 [Service Thread] INFO org.apache.pig.impl.util.SpillableMemoryManager - first memory handler call - Collection threshold init = 698875904(682496K) used = 575405920(561919K) committed = 698875904(682496K) max = 698875904(682496K)
2015-11-24 11:47:36,255 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. GC overhead limit exceeded
The stack trace looks something like (each time the exception in is another function ):
Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. Java heap space
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CounterGroupPBImpl.initCounters(CounterGroupPBImpl.java:136)
at org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CounterGroupPBImpl.getAllCounters(CounterGroupPBImpl.java:121)
at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240)
at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:367)
at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:388)
at org.apache.hadoop.mapred.ClientServiceDelegate.getTaskReports(ClientServiceDelegate.java:448)
at org.apache.hadoop.mapred.YARNRunner.getTaskReports(YARNRunner.java:551)
at org.apache.hadoop.mapreduce.Job$3.run(Job.java:533)
at org.apache.hadoop.mapreduce.Job$3.run(Job.java:531)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapreduce.Job.getTaskReports(Job.java:531)
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getTaskReports(HadoopShims.java:235)
at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addMapReduceStatistics(MRJobStats.java:352)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:233)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:165)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:360)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:280)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
...
My set of SET statements in the pig script :
SET mapreduce.map.java.opts '-server -Xmx6144m -Djava.net.preferIPv4Stack=true -Duser.timezone=UTC'
SET mapreduce.reduce.java.opts '-server -Xmx6144m -Djava.net.preferIPv4Stack=true -Duser.timezone=UTC'
SET mapreduce.map.memory.mb '8192'
SET mapreduce.reduce.memory.mb '8192'
SET mapreduce.map.speculative 'true'
SET mapreduce.reduce.speculative 'true'
SET mapreduce.jobtracker.maxtasks.perjob '100000'
SET mapreduce.job.split.metainfo.maxsize '-1'
Why is this happening, and how can I fix that ?
Thanks in advance for any help.
Looks like this is caused in your application manager, since you mention that the error is being returned after the execution of all mappers/reducers. Try increasing the memory of application-manager.
In a YARN cluster, you can use the following two properties to control the amount of memory available to your ApplicationMaster:
yarn.app.mapreduce.am.command-opts
yarn.app.mapreduce.am.resource.mb
Again, you could set -Xmx (in the former) to 75% of the resource.mb value.
Details regarding the parameters can be found here.

Launching jobs in a for loop

I am confronted with a weird problem. I have a mapreduce class which looks for patterns in a file (the patternfile goes into DistributedCache). Now I wanted to reuse this class to run for 1000 pattern files. I just had to extend the pattern matching class and override its main and run function. In the run of the child class I modify the commandline arguments and feed them to the parents run() function. Everything goes well up until iteration 45-50. Suddenly all tasktrackers start to fail until no progress is made. I checked the HDFS but still 70% of space left. Anybody any ideas as to why launching 50 jobs, one by one causes difficulties to hadoop?
#Override
public int run(String[] args) throws Exception {
//-patterns patternsDIR input/ output/
List<String> files = getFiles(args[1]);
String inputDataset=args[2];
String outputDir=args[3];
for (int i=0; i<files.size(); i++){
String [] newArgs= new String[4];
newArgs = modifyArgs(args);
super.run(newArgs);
}
return 0;
}
EDIT: Just checked the job logs, this is the first error occurring:
2013-11-12 09:03:01,665 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hduser cause:java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2013-11-12 09:03:32,971 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201311120807_0053_m_000053_0' has completed task_201311120807_0053_m_000053 successfully.
2013-11-12 09:07:51,717 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hduser cause:java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2013-11-12 09:08:05,973 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201311120807_0053_m_000128_0' has completed task_201311120807_0053_m_000128 successfully.
2013-11-12 09:08:16,571 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201311120807_0053_m_000130_0' has completed task_201311120807_0053_m_000130 successfully.
2013-11-12 09:08:16,571 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_1595161181_30] for 30 seconds. Will retry shortly ...
2013-11-12 09:08:27,175 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201311120807_0053_m_000138_0' has completed task_201311120807_0053_m_000138 successfully.
2013-11-12 09:08:25,241 ERROR org.mortbay.log: EXCEPTION
java.lang.OutOfMemoryError: Java heap space
2013-11-12 09:08:25,241 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 54311, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus#7fcb9c0a, false, false, true, 9834) from 10.1.1.13:55028: error: java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:62)
at java.lang.StringBuilder.<init>(StringBuilder.java:97)
at org.apache.hadoop.util.StringUtils.escapeString(StringUtils.java:435)
at org.apache.hadoop.mapred.Counters.escape(Counters.java:768)
at org.apache.hadoop.mapred.Counters.access$000(Counters.java:52)
at org.apache.hadoop.mapred.Counters$Counter.makeEscapedCompactString(Counters.java:111)
at org.apache.hadoop.mapred.Counters$Group.makeEscapedCompactString(Counters.java:221)
at org.apache.hadoop.mapred.Counters.makeEscapedCompactString(Counters.java:648)
at org.apache.hadoop.mapred.JobHistory$MapAttempt.logFinished(JobHistory.java:2276)
at org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2636)
at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1222)
at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:4471)
at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3306)
at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3001)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
2013-11-12 09:08:16,571 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 54311, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus#3269c671, false, false, true, 9841) from 10.1.1.23:42125: error: java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$Packet.<init>(DFSClient.java:2875)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:3806)
at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:290)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:294)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:140)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at java.io.BufferedWriter.flush(BufferedWriter.java:253)
at java.io.PrintWriter.flush(PrintWriter.java:293)
at java.io.PrintWriter.checkError(PrintWriter.java:330)
at org.apache.hadoop.mapred.JobHistory.log(JobHistory.java:847)
at org.apache.hadoop.mapred.JobHistory$MapAttempt.logStarted(JobHistory.java:2225)
at org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2632)
at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1222)
at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:4471)
at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3306)
at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3001)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
at java.security.AccessController.doPrivileged(Native Method)
And after that we see a bunch of:
2013-11-12 09:13:48,204 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201311120807_0053_m_000033_0: Lost task tracker: tracker_n144-06b.wall1.ilabt.iminds.be:localhost/127.0.0.1:47567
EDIT2: Some ideas?
The heap space error is kind of unexpected since the mappers hardly require any memory.
I am calling the base class with super.run(), should I use a Toolrunner call for that?
In every iteration a file with approximately 1000 words + score is added to the DistributedCache, I am not sure whether I should reset the cache somewhere? (every job in the super.run() runs with job.waitForCompletion(), is the cache cleared then?)
EDIT3:
#Donald: I haven't resized the memory for the hadoop daemons, so they should have a heap of 1GB each. The maptasks have 800 MB of heap from which 450 MB is used for io.sort.
#Chris: I haven't modified anything on the counters, I am using the regular ones. There are 1764 map tasks with 16 counters each, and the job itself will have another 20 or so. This might indeed add up after 50 consecutive jobs, but I would think it is not stored in the heap if you are running multiple consecutive jobs?
#Extra information:
The map tasks are extremely fast, it only takes 3-5 seconds per task, and I have jvm.reuse=-1. A map tasks processes a file with 10 records (the file is much smaller than the block size). Due to the small files I could consider making input files with 100 records to reduce the mapping overhead.
The first thing I tried was to add a unit reducer (1 reduce task) to reduce the number of files create in the HDFS, (otherwise there would be 1 per pattern and therefore 1000 per job which might create overhead for the datanodes)
The number of records per job is rather low, I am looking for specific words in 1764 files and the number of matches with one of 1000 patterns is around 5000 map output records in total)
#All: Thanks for helping me out guys!

Categories