Kubernetes, simple SpringBoot app OOMKilled - java

I'm working with OpenJDK 11 and a very simple SpringBoot application that almost the only thing it has is the SpringBoot actuator enabled so I can call /actuator/health etc.
I also have a kubernetes cluster on GCE very simple with just a pod with a container (containing this app of course)
My configuration has some key points that I want to highlight, it has some requirements and limits
resources:
limits:
memory: 600Mi
requests:
memory: 128Mi
And it has a readiness probe
readinessProbe:
initialDelaySeconds: 30
periodSeconds: 30
httpGet:
path: /actuator/health
port: 8080
I'm also setting a JVM_OPTS like (that my program is using obviously)
env:
- name: JVM_OPTS
value: "-XX:MaxRAM=512m"
The problem
I launch this and it gets OOMKilled in about 3 hours every time!
I'm never calling anything myself the only call is the readiness probe each 30 seconds that kubernetes does, and that is enough to exhaust the memory ? I have also not implemented anything out of the ordinary, just a Get method that says hello world along all the SpringBoot imports to have the actuators
If I run kubectl top pod XXXXXX I actually see how gradually get bigger and bigger
I have tried a lot of different configurations, tips, etc, but anything seems to work with a basic SpringBoot app
Is there a way to actually hard limit the memory in a way that Java can raise a OutOfMemory exception ? or to prevent this from happening?
Thanks in advance
EDIT: After 15h running
NAME READY STATUS RESTARTS AGE
pod/test-79fd5c5b59-56654 1/1 Running 4 15h
describe pod says...
State: Running
Started: Wed, 27 Feb 2019 10:29:09 +0000
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Wed, 27 Feb 2019 06:27:39 +0000
Finished: Wed, 27 Feb 2019 10:29:08 +0000
That last span of time is about 4 hours and only have 483 calls to /actuator/health, apparently that was enough to make java exceed the MaxRAM hint ?
EDIT: Almost 17h
its about to die again
$ kubectl top pod test-79fd5c5b59-56654
NAME CPU(cores) MEMORY(bytes)
test-79fd5c5b59-56654 43m 575Mi
EDIT: loosing any hope at 23h
NAME READY STATUS RESTARTS AGE
pod/test-79fd5c5b59-56654 1/1 Running 6 23h
describe pod:
State: Running
Started: Wed, 27 Feb 2019 18:01:45 +0000
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Wed, 27 Feb 2019 14:12:09 +0000
Finished: Wed, 27 Feb 2019 18:01:44 +0000
EDIT: A new finding
Yesterday night I was doing some interesting reading:
https://developers.redhat.com/blog/2017/03/14/java-inside-docker/
https://banzaicloud.com/blog/java10-container-sizing/
https://medium.com/adorsys/jvm-memory-settings-in-a-container-environment-64b0840e1d9e
TL;DR I decided to remove the memory limit and start the process again, the result was quite interesting (after like 11 hours running)
NAME CPU(cores) MEMORY(bytes)
test-84ff9d9bd9-77xmh 218m 1122Mi
So... WTH with that CPU? I kind expecting a big number on memory usage but what happens with the CPU?
The one thing I can think is that the GC is running as crazy thinking that the MaxRAM is 512m and he is using more than 1G. I'm wondering, is Java detecting ergonomics correctly? (I'm starting to doubt it)
To test my theory I set a limit of 512m and deploy the app this way and I found that from the start there is a unusual CPU load that it has to be the GC running very frequently
kubectl create ...
limitrange/mem-limit-range created
pod/test created
kubectl exec -it test-64ccb87fd7-5ltb6 /usr/bin/free
total used free shared buff/cache available
Mem: 7658200 1141412 4132708 19948 2384080 6202496
Swap: 0 0 0
kubectl top pod ..
NAME CPU(cores) MEMORY(bytes)
test-64ccb87fd7-5ltb6 522m 283Mi
522m is too much vCPU, so my logical next step was to ensure I'm using the most appropriated GC for this case, I changed the JVM_OPTS this way:
env:
- name: JVM_OPTS
value: "-XX:MaxRAM=512m -Xmx128m -XX:+UseSerialGC"
...
resources:
requests:
memory: 256Mi
cpu: 0.15
limits:
memory: 700Mi
And thats bring the vCPU usage to a reasonable status again, after kubectl top pod
NAME CPU(cores) MEMORY(bytes)
test-84f4c7445f-kzvd5 13m 305Mi
Messing with Xmx having MaxRAM is obviously affecting the JVM but how is not possible to control the amount of memory we have on virtualized containers ? I know that free command will report the host available RAM but OpenJDK should be using cgroups rihgt?.
I'm still monitoring the memory ...
EDIT: A new hope
I did two things, the first one was to remove again my container limit, I want to analyze how much it will grow, also I added a new flag to see how the process is using the native memory -XX:NativeMemoryTracking=summary
At the beginning every thing was normal, the process started consuming like 300MB via kubectl top pod so I let it running about 4 hours and then ...
kubectl top pod
NAME CPU(cores) MEMORY(bytes)
test-646864bc48-69wm2 54m 645Mi
kind of expected, right ? but then I checked the native memory usage
jcmd <PID> VM.native_memory summary
Native Memory Tracking:
Total: reserved=2780631KB, committed=536883KB
- Java Heap (reserved=131072KB, committed=120896KB)
(mmap: reserved=131072KB, committed=120896KB)
- Class (reserved=203583KB, committed=92263KB)
(classes #17086)
( instance classes #15957, array classes #1129)
(malloc=2879KB #44797)
(mmap: reserved=200704KB, committed=89384KB)
( Metadata: )
( reserved=77824KB, committed=77480KB)
( used=76069KB)
( free=1411KB)
( waste=0KB =0.00%)
( Class space:)
( reserved=122880KB, committed=11904KB)
( used=10967KB)
( free=937KB)
( waste=0KB =0.00%)
- Thread (reserved=2126472KB, committed=222584KB)
(thread #2059)
(stack: reserved=2116644KB, committed=212756KB)
(malloc=7415KB #10299)
(arena=2413KB #4116)
- Code (reserved=249957KB, committed=31621KB)
(malloc=2269KB #9949)
(mmap: reserved=247688KB, committed=29352KB)
- GC (reserved=951KB, committed=923KB)
(malloc=519KB #1742)
(mmap: reserved=432KB, committed=404KB)
- Compiler (reserved=1913KB, committed=1913KB)
(malloc=1783KB #1343)
(arena=131KB #5)
- Internal (reserved=7798KB, committed=7798KB)
(malloc=7758KB #28415)
(mmap: reserved=40KB, committed=40KB)
- Other (reserved=32304KB, committed=32304KB)
(malloc=32304KB #3030)
- Symbol (reserved=20616KB, committed=20616KB)
(malloc=17475KB #212850)
(arena=3141KB #1)
- Native Memory Tracking (reserved=5417KB, committed=5417KB)
(malloc=347KB #4494)
(tracking overhead=5070KB)
- Arena Chunk (reserved=241KB, committed=241KB)
(malloc=241KB)
- Logging (reserved=4KB, committed=4KB)
(malloc=4KB #184)
- Arguments (reserved=17KB, committed=17KB)
(malloc=17KB #469)
- Module (reserved=286KB, committed=286KB)
(malloc=286KB #2704)
Wait, What ? 2.1 GB reserved for threads? and 222 MB being used, what is this ? I currently don't know, I just saw it...
I need time trying to understand why this is happening

I finally found my issue and I want to share it so others can benefit in some way from this.
As I found on my last edit I had a thread problem that was causing all the memory consumption over time, specifically we was using an asynchronous method from a third party library without properly taking care those resources (ensure those calls was ending correctly in this case).
I was able to detect the issue because I used a memory limit on my kubernete deployment from the beginning (which is a good practice on production environments) and then I monitored very closely my app memory consumption using tools like jstat, jcmd, visualvm, kill -3 and most importantly the -XX:NativeMemoryTracking=summary flag that gave me so much detail in this regard.

Related

Kubernetes Pod memory keeps on increasing and shoots way beyond Java application consumption

Current Setup:
Java application running as Kubernetes workload on Google Kubernetes engine
JVM args : "-XX:+UseContainerSupport", "-XX:NativeMemoryTracking=detail", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/dumps/oom.bin", "-Xmx300m", "-Xms300m"
Docker memory set using deployment.yml
limits:
cpu: 350m
memory: 700Mi
requests:
cpu: 200m
memory: 128Mi
Native memory committed is below 550m when the pod is very near to hitting the max container limit. This leads me to believe that there is something apart from native memory causing this issue
Total: reserved=1788920KB, committed=546092KB
- Java Heap (reserved=307200KB, committed=307200KB)
(mmap: reserved=307200KB, committed=307200KB)
- Class (reserved=1116111KB, committed=75431KB)
(classes #12048)
(malloc=1999KB #25934)
(mmap: reserved=1114112KB, committed=73432KB)
- Thread (reserved=46444KB, committed=46444KB)
(thread #46)
(stack: reserved=46240KB, committed=46240KB)
(malloc=153KB #270)
(arena=51KB #88)
- Code (reserved=259362KB, committed=57214KB)
(malloc=9762KB #14451)
(mmap: reserved=249600KB, committed=47452KB)
- GC (reserved=1034KB, committed=1034KB)
(malloc=26KB #180)
(mmap: reserved=1008KB, committed=1008KB)
- Compiler (reserved=321KB, committed=321KB)
(malloc=189KB #982)
(arena=133KB #5)
- Internal (reserved=39461KB, committed=39461KB)
(malloc=39429KB #16005)
(mmap: reserved=32KB, committed=32KB)
- Symbol (reserved=15588KB, committed=15588KB)
(malloc=13214KB #128579)
(arena=2374KB #1)
- Native Memory Tracking (reserved=3220KB, committed=3220KB)
(malloc=249KB #3553)
(tracking overhead=2971KB)
- Arena Chunk (reserved=178KB, committed=178KB)
(malloc=178KB)
Have also checked memory consumption of java process using ps (550m) and htop (res 604m)
Issue
The pod has single container and that container has only this java application running with above mentioned configs. This java application essentially downloads images in parallel using threads with some validations like image size, dimension etc. The pod memory consumption keeps increasing until it almost hits that limit and the application restarts. I have read through many different posts and articles and I can't figure out what is causing this memory to keep going up.
Edit 1: Docker image is based off openjdk:8-jdk-slim
Edit2: Some screenshots of monitoring

Simple Spring Boot Microservice - Memory Usage Consumption over 300 MB [duplicate]

I'm using spring boot to develop a client application.
and when run the spring boot application(using a fully executable jar), the memory usage is about 190M in x64 server, and 110M in x86 server.
My JVM options are (-Xmx64M -Xms64M -XX:MaxPermSize=64M -server),
why is it that in the x64 server, memory usage is so big?
how to reduce memory usage below 150M?
thanks.
Little late to the game here, but I suffered the same issue with a containerised Spring Boot application on Docker. The bare minimum you'll get away with is around 72M total memory on the simplest of Spring Boot applications with a single controller and embedded Tomcat. Throw in Spring Data REST, Spring Security and a few JPA entities and you'll be looking at 200M-300M minimum. You can get a simple Spring Boot app down to around 72M total by using the following JVM options.
With -XX:+UseSerialGC This will perform garbage collection inline with the thread allocating the heap memory instead of a dedicated GC thread(s)
With -Xss512k This will limit each threads stack memory to 512KB instead of the default 1MB
With -XX:MaxRAM=72m This will restrict the JVM's calculations for the heap and non heap managed memory to be within the limits of this value.
In addition to the above JVM options you can also use the following property inside your application.properties file:
server.tomcat.max-threads = 1 This will limit the number of HTTP request handler threads to 1 (default is 200)
Here is an example of docker stats running a very simple Spring Boot application with the above limits and with the docker -m 72m argument. If I decrease the values any lower than this I cannot get the app to start.
83ccc9b2156d: Mem Usage: 70.36MiB / 72MiB | Mem Percentage: 97.72%
And here you can see a breakdown of all the native and java heap memory on exit.
Native Memory Tracking:
Total: reserved=1398681KB, committed=112996KB
- Java Heap (reserved=36864KB, committed=36260KB)
(mmap: reserved=36864KB, committed=36260KB)
- Class (reserved=1086709KB, committed=43381KB)
(classes #7548)
( instance classes #7049, array classes #499)
(malloc=1269KB #19354)
(mmap: reserved=1085440KB, committed=42112KB)
( Metadata: )
( reserved=36864KB, committed=36864KB)
( used=36161KB)
( free=703KB)
( waste=0KB =0.00%)
( Class space:)
( reserved=1048576KB, committed=5248KB)
( used=4801KB)
( free=447KB)
( waste=0KB =0.00%)
- Thread (reserved=9319KB, committed=938KB)
(thread #14)
(stack: reserved=9253KB, committed=872KB)
(malloc=50KB #74)
(arena=16KB #26)
- Code (reserved=248678KB, committed=15310KB)
(malloc=990KB #4592)
(mmap: reserved=247688KB, committed=14320KB)
- GC (reserved=400KB, committed=396KB)
(malloc=272KB #874)
(mmap: reserved=128KB, committed=124KB)
- Compiler (reserved=276KB, committed=276KB)
(malloc=17KB #409)
(arena=260KB #6)
- Internal (reserved=660KB, committed=660KB)
(malloc=620KB #1880)
(mmap: reserved=40KB, committed=40KB)
- Symbol (reserved=11174KB, committed=11174KB)
(malloc=8417KB #88784)
(arena=2757KB #1)
- Native Memory Tracking (reserved=1858KB, committed=1858KB)
(malloc=6KB #80)
(tracking overhead=1852KB)
- Arena Chunk (reserved=2583KB, committed=2583KB)
(malloc=2583KB)
- Logging (reserved=4KB, committed=4KB)
(malloc=4KB #179)
- Arguments (reserved=17KB, committed=17KB)
(malloc=17KB #470)
- Module (reserved=137KB, committed=137KB)
(malloc=137KB #1616)
Don't expect to get any decent performance out of this either, as I would imagine the GC would be running frequently with this setup as it doesn't have a lot of spare memory to play with
After search, i found it's already have answer in stackoveflow.
Spring Boot memory consumption increases beyond -Xmx option
1. Number of http threads (Undertow starts around 50 threads per default, but you can increase / decrease via property the amount of threads needed)
2. Access to native routines (.dll, .so) via JNI
3. Static variables
4. Use of cache (memcache, ehcache, etc)
If a VM is 32 bit or 64 bit, 64 bit uses more memory to run the same application, so if you don't need a heap bigger than 1.5GB, so keep your application runnnig over 32 bit to save memory.
because spring boot starts around 50 threads per default for http service(Tomcat or Undertow, Jetty), and its use 1 MB per thread(64bit jvm default setting).
SO the in 64bit jvm, the memory usage is
heap(64M) + Permgen(max 64M) + thread stacks(1M x 50+) + native handles.
references:
https://dzone.com/articles/how-to-decrease-jvm-memory-consumption-in-docker-u
http://trustmeiamadeveloper.com/2016/03/18/where-is-my-memory-java/
https://developers.redhat.com/blog/2017/04/04/openjdk-and-containers/
You can use -XX:+UseSerialGC as JVM Argument to specify Serial Garbage Collector which is best choice to reduce Memory Heap .

Hadoop node is not active

I have 1 x master node and 1 x slave node setup.
My issue is when running the map reduce processing. The slave node doesn't seem working. Anyone can provide help on how to check, to change and ensure the slave is working?
The config files info can be found on the URL below too
https://drive.google.com/file/d/1ULEe6k2zYnfQDQUQIbz_xR29WgT1DJhB/view
Here are my observation
1) When i check the CPU resources utilization, The slaves doesn't seem working and CPU resources at 0% when running the map reduce job while the master at 44% CPU resources. refer to the attachment.
2) When i run the dfs report it show it has 2 live nodes but on the cluster web it show only 1. Refer to the attachment and below.
3) The total processing time of map reduce is same with or without the slave
-------------------------------------------------
Live datanodes (2):
Name: 192.168.249.128:9866 (node-master)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 20587741184 (19.17 GB)
DFS Used: 174785723 (166.69 MB)
Non DFS Used: 60308293 (57.51 MB)
DFS Remaining: 20352647168 (18.95 GB)
DFS Used%: 0.85%
DFS Remaining%: 98.86%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Oct 23 11:17:39 PDT 2018
Last Block Report: Tue Oct 23 11:07:32 PDT 2018
Num of Blocks: 93
Name: 192.168.249.129:9866 (node1)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 20587741184 (19.17 GB)
DFS Used: 85743 (83.73 KB)
Non DFS Used: 33775889 (32.21 MB)
DFS Remaining: 20553879552 (19.14 GB)
DFS Used%: 0.00%
DFS Remaining%: 99.84%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Tue Oct 23 11:17:38 PDT 2018
Last Block Report: Tue Oct 23 11:03:59 PDT 2018
Num of Blocks: 4
You're showing datanodes with dfsreport, not nodemanagers that actually are processing the data. In the YARN UI, you will want to take note of the "Active Nodes" counter, which in your case is 1. That would make sense if the master is a namenode and resource manager while the slave would be a datanode and nodemanager.
Other than that, if you have a non splittable file, for example a ZIP, or your file is less than the block size (by default 128 MB), then only one mapper will process that. Plus, it's not guaranteed that mappers (or reducers) will be distributed evenly over all available resources
Outside of a learning environment, though, 40 GB of storage and 8 GB of RAM would be better spent on multi threading rather than distributed computing (or a proper database; i.e parse files and load them into a queryable store). Or use Spark or Pig, which don't require Hadoop, but are much easier to work with than MapReduce

SparkR out of memory error

I have a 2 node test cluster on AWS with spark-2.0.0-bin-hadoop2.7 installed.
This is the code I'm using to launch the cluster.
./spark-ec2 -k blah -i blah.pem -r us-west-1 -s 1 -t r3.2xlarge launch --copy-aws-credentials blah
Viewing port 8080 shows 58.8GB(0.0 B Used) of memory after running these two lines in rstudio.
Sys.setenv(SPARK_HOME="/root/spark")
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
When I run this line and refresh the page on port 8080 the memory usage changes to 58.8 GB (53.8 GB Used).
sparkR.session(master = "spark://[ip]:7077",
sparkHome = '/root/spark',
enableHiveSupport = FALSE)
When I try to create a spark data frame from a data frame which should consume 0.04857268 GB of memory I get this error:
acquisition <- as.DataFrame(orig)
17/11/04 14:27:23 WARN TaskSetManager: Stage 0 contains a task of very large size (166360 KB). The maximum recommended task size is 100 KB.
Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError: Java heap space
I tried adding this but get the same error.
options(java.parameters = "-Xmx2048m")
install.packages("rJava")
library(rJava)
I'm stuck. I've spent three weekends googling this issue and can't figure it out.
Thanks.

Tomcat 8 : 100% cpu usage

I am having a problem with tomcat since switching to a different package provider (bitnami -> official debian).
Someone seems to be hitting our servers with a request (with malicious intent):
59.111.29.6 - - [04/Feb/2017:16:17:58 +0000] "-" 400 -
where "-" is the request path, which coincides with
Feb 04, 2017 4:17:58 PM org.apache.coyote.http11.AbstractHttp11Processor process
INFO: Error parsing HTTP request header
Note: further occurrences of HTTP header parsing errors will be logged at DEBUG level.
which coincides with the increased CPU usage.
The server status shows the following:
<h1>JVM</h1><p> Free memory: 355.58 MB Total memory: 833.13 MB Max memory: 2900.00 MB</p><table border="0"><thead><tr><th>Memory Pool</th><th>Type</th><th>Initial</th><th>Total</th><th>Maximum</th><th>Used</th></tr></thead><tbody><tr><td>Eden Space</td><td>Heap memory</td><td>34.12 MB</td><td>229.93 MB</td><td>800.00 MB</td><td>12.47 MB (1%)</td></tr><tr><td>Survivor Space</td><td>Heap memory</td><td>4.25 MB</td><td>28.68 MB</td><td>100.00 MB</td><td>2.22 MB (2%)</td></tr><tr><td>Tenured Gen</td><td>Heap memory</td><td>85.37 MB</td><td>574.51 MB</td><td>2000.00 MB</td><td>462.84 MB (23%)</td></tr><tr><td>Code Cache</td><td>Non-heap memory</td><td>2.43 MB</td><td>7.00 MB</td><td>48.00 MB</td><td>6.89 MB (14%)</td></tr><tr><td>Perm Gen</td><td>Non-heap memory</td><td>128.00 MB</td><td>128.00 MB</td><td>512.00 MB</td><td>52.57 MB (10%)</td></tr></tbody></table><h1>"http-nio-8080"</h1><p> Max threads: 200 Current thread count: 10 Current thread busy: 3 Keeped alive sockets count: 1<br> Max processing time: 301 ms Processing time: 71.068 s Request count: 10021 Error count: 2996 Bytes received: 0.00 MB Bytes sent: 3.18 MB</p><table border="0"><tr><th>Stage</th><th>Time</th><th>B Sent</th><th>B Recv</th><th>Client (Forwarded)</th><th>Client (Actual)</th><th>VHost</th><th>Request</th></tr><tr><td><strong>F</strong></td><td>1486364749526 ms</td><td>0 KB</td><td>0 KB</td><td>185.40.4.169</td><td>185.40.4.169</td><td nowrap>?</td><td nowrap class="row-left">? ? ?</td></tr><tr><td><strong>F</strong></td><td>1486364749526 ms</td><td>0 KB</td><td>0 KB</td><td>185.40.4.169</td><td>185.40.4.169</td><td nowrap>?</td><td nowrap class="row-left">? ? ?</td></tr><tr><td><strong>R</strong></td><td>?</td><td>?</td><td>?</td><td>?</td><td>?</td><td>?</td></tr><tr><td><strong>S</strong></td><td>36 ms</td><td>0 KB</td><td>0 KB</td><td>106.51.39.130</td><td>106.51.39.130</td><td nowrap>104.197.119.177</td><td nowrap class="row-left">GET /manager/status?org.apache.catalina.filters.CSRF_NONCE=072F9F6884D94C5D7B30D1D34CE61BD9 HTTP/1.1</td></tr><tr><td><strong>R</strong></td><td>?</td><td>?</td><td>?</td><td>?</td><td>?</td><td>?</td></tr></table><p>P: Parse and prepare request S: Service F: Finishing R: Ready K: Keepalive</p><hr size="1" noshade="noshade">
<center><font size="-1" color="#525D76">
So it doesn't seem like an out of memory issue (but I could be wrong).
How can I stop someone from making the request in the first place to avoid the issues I'm facing? My webapp running on tomcat restricts HTTP methods to GET/POST, but how can I configure tomcat as a whole to restrict them?
I would advise you to obtain a thread dump of your server :
Isolates the PID of the tomcat server using :
jps -l
Obtains a thread dump using :
kill -3 PID
or jstack PID
Then checks the Thread dump, you should find the reason of the hogging thread

Categories