Best practices for distributed profiling

Best practices for distributed profiling - java

perhaps nothing new but we have a use case to profile a spark application that runs in a distributed fashion
We currently use the async-profiler that monitors each executor ( a process in spark ) and generates a JFR per process. It's a little annoying to look at individual executor profiles and make any sense/compare
We are using the JFR assemble to combine all the JFR's produced. Curious, is this how distributed profiling done?
/async-profiler/profiler.sh collect -e cpu -d 120 -i 20ms -o jfr -f ${file} ${pid}
This is run periodically every 120 seconds thus creating a continuous mode profiling
The benchmark we are running is to run a job in a cluster of EC2 2xl vs EC2 4xl and what we are noticing is that on 4xl our jobs are running slower. The 2xl cluster has twice the number of machines as 4xl
Each process uses 8cores, 54gb heap. On 2xl, each machine runs a single process but on 4xl, we run 2 process per machine without any isolation
Any leads on how to debug this is appreciated. Let us know if I need to add any more options to the async-profiler. We clearly see more time spent on CPU hence the -e CPU

Related

SpringBoot Application : CPU usage reaches to maximum just sometimes

How to troubleshoot/Optimize CPU usage in a Springboot application. Are the allocated resources sufficient for an application having a total of around a 300k user base? The application isn't heavy at all. It just calls third-party APIs and do the necessary checks and gives the response.
How to identify exact codes that could have been using more resources than normally required? I found out somewhere that tracking the processes id from top command and reaching to thread dump and looking up for the corresponding hexadecimal value of processid that could have been using more CPU is one way to figure out. This wasn't easily achievable as some of the commands suggested didn't work. I would appreciate any help or suggestions.
Thanks in advance.
Htop command output
Htop when it's normal

The process of Collection of Thread Stack is no different for a spring boot app. Before a boot app is containerized it is still a Jar. If you suspect that its your application that is actually contributing to the high CPUT then run your jar and attach a profiler to it and trace the code contributing to the high CPU on load. If you can not do it then take the thread dump of the running jar/java process and use any free or opensource tool to analyze the trace. The second logic explained is applicable for the containerized application as well.
Follow this steps to take the thread dump of a java app/boot app running inside a docker container :-
docker exec -it <containerName> jstack > someFile.txt
Take multiple snapshot of it for better visiblity and comparision.

If you have not added JMX enable options to the JVM commandline, do it to begin with:
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=10000
-Dcom.sun.management.jmxremote.rmi.port=10000
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
Then on your local machine you start "jmc" from your JDK bin folder and connect to your spring boot server.
You will then be able to see all the threads and enable both CPU load and thread locks on all active threads.
Be aware though that the above opens up JVM for unauthenticated entry so keep the port safe.
Next if your JVM dies send a "kill -3" SIGHUP which will tell the JVM to dump the core. that can then be read via the Eclipse MAT plugin in order to analyze the JVM inner doings.
Another way is to install jolokia into your server for other ways to retrieve the same info.

java server application performance issues

i am monitoring a java process that runs for a long period of time, the process uses G1 garbage collector.
those are my java options(runs in server mode):
-Xmx2048m -XX:+UseG1GC -XX:+UseStringDeduplication -XX:G1HeapRegionSize=32M -XX:MetaspaceSize=100m -XX:CompressedClassSpaceSize=400m
the process is a java websocket server which involves a lot of I/O with embedded tomcat in front of it.
according to java mission control the top heap usage reaches 1.7GB and after every GC cycle it decreases back to ~900MB.
the process runs in a docker container with 3.8GB of memory restriction.
when using top -o %MEM it shows that total RES is 2.6GB and never more.
i am noticing that after a heavy load of websockets connect/disconnect the process becomes really slow and unresponsive.
when i try to load this process again after 6-7 hours, while it's idle and "clean" of connections it responses after 6-7 seconds, while in the first load the response time was much lower ~2-3 seconds.
i thought it's file descriptors related but while checking with:
ls /proc/1/fd | wc -l
from inside the docker it shows all file descriptors released.
java version: 8u131
tomcat version: 8.5.60
this is how my heap looks like after 6 hours of load with no connections on the server:
and this is from JMC:
how can i investigate it further?

Don't focus on the memory & garbage collection (yet) but identify what takes time in the request processing.
Is the time spent in your request handler code? Is it somewhere else?
If you can modify the code, you can simply add some println statements to start.
And/Or you can use a lightweight cpu profiler like https://github.com/jvm-profiling-tools/async-profiler to get a CPU flame graph - wall-clock profiling might be especially helpful for "off-cpu" analysis.

Monitoring Cassandra CPU usage with JMX/MBeans

I want to write a simple Java code to monitor a Cassandra database with JMX. Now I'm stuck with retrieving the CPU usage of the database. As far as I figured out, a possible MBean would be the java.lang:type=OperatingSystem with attribute ProcessCpuLoad.
However, it seems in this case the result would be the CPU usage of all processes running in the JVM, not only the Cassandra threads. Is this assumption correct?
I also wonder what data is shown as CPU usage when connecting with JConsole to the database. Is it possible to get direct access to these values(I mean without JConsole)? Or is there another Mbean which gives exactly the desired values?
Thanks,
Nico

ProcessCpuLoad in the OS mbean is correct. Its not all JVMs, just the one JVM thats reporting it. You do not have multiple processes running within a single JVM, the JVM runs as a single process per java application.
You can use java.lang:type=Threading to monitor cpu time spent on individual threads but there are a ton of threads in Cassandra and it will probably never be totally right (miss things like GC time).
If dont want to use jconsole you can check:
ps -p <whatever-your-cassandra-pid-is> -o %cpu
# or depending on OS/installer
ps -p `cat /var/run/cassandra.pid` -o %cpu

Java Multithread Application uses only one Core

I have a problem with my JVM Running on a CentOS 6.0 with openJDK 1.7.0_51 64Bit.
My System is a 4-Core System with 8GB Ram.
I'm running a Java multithread application that I wrote myself. It's supposed to insert tons of Data into a NoSQL Database.
For that, I'm spawning 4 threads, using a "CachedThreadPoolExecutor" from java.concurrent.Executors.
I instantiate 4 Workers that implement the "Runnable" Interface. Afterwards I execute the Thread using the threadpool. Here's my code:
public void startDataPump(int numberOfWorkers){
//class "DataPump" implements runnable
for (int i = 0; i < numberOfWorkers; i++){
DataPump pump = new DataPump();
//"workerList" contains all workers and is a simple arrayList to keep track of the workers
workerList.add(pump);
//"workers" is the thradpool that has been
//initialized earlier with "Executors.newCachedThreadPool()
workers.execute(pump);
}
}
When running this, using a parameter of 4, it will spawn 4 Threads in the Threadpool. I assumed that the JVM or my OS would be smart enough to schedule these threads on all of my cores.
HOWEVER, only one core of my cpu is working at 100%,the others remain almost idle.
Am I doing anything wrong in my code or is this a JVM/OS problem. If so, is there anything I can do about that?
Running this application on only 1 core is extremeley slowing down the whole app.
Help is greatly appreciated :)

Please bear in mind that its the OS and not the JVM responsible for CPU affinity - which is why I suggested that you first figure out how many CPU's you have and then perhaps use schedutils to configure processor affinity for a certain process.
cpu info - use one of the three below
/proc/cpuinfo
lscpu
nproc
install schedutils to confgure processor affinity
yum install schedutils
You can assign cpu affinity via schedutils as follows (2 is second proceccor and 23564 is process id):
taskset -c 2 -p 23564

Scheduling thread is not JVM activity but it is OS activity.if OS finds threads are independent of each other and can be executed seperately then it schedules it on another core.
I am not sure about schedutils but I think it works at application level (it allows you to set cpu affinity but last decision is taken by OS)
one thing about using cores is OS scheduler schedules new processes on new cores as every process has its own process area independent of other processes (thus they can be executed parallely without any obstruction)
Try creating new process for each thread that will help improve your cpu utilization(use of more cores) but there is disadvantage of it also, Every process creates its own process area so extra memory is required for each process (for each thread in your case) if you have good amount of memory available then you can try this one.
if it just a linux OS then "sar" command is enough for monitoring per core cpu utilization (sar is base package in linux almost all utilities use 'sar' so overhead on system will be less).

If your environment are virtual or in other hand special cpu scheduling like docker, there is no way to get Java to automatically use find out many cores are available and use them all. You have to specify how many cores you want to use via
On JDK >= 10, use the following JDK options:
-XX:ActiveProcessorCount=2
On JDK >= 8, use the following JDK options:
-XX:+UnlockExperimentalVMOptions > -XX:ActiveProcessorCount=2

How to take memory snapshots at regular interval for jrockit?

We are running some heavy deployments on weblogic setup and it takes around an hour. During that time, we want to take a memory snapshots/heap dumps to see how much headroom we have wrt memory to avoid crash. Is there any optional jvm arg that we can provide while starting the server which will do the job? I checked below link but nothing is fitting the requirement -
http://docs.oracle.com/cd/E15289_01/doc.40/e15062/optionxx.htm

If acceptable to drive the snapshots from the outside then you can use jrcmd to send commands to your JVM.
To get the PID use
jrcmd -P
and then you can use
jrcmd PID hprofdump dumpfile.bin
See http://docs.oracle.com/cd/E15289_01/doc.40/e15062/diagnostic.htm#BABIACCC for hrpofdump and http://docs.oracle.com/cd/E15289_01/doc.40/e15061/ctrlbreakhndlr.htm#i1001760 for jrcmd.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Best practices for distributed profiling - java

Related

SpringBoot Application : CPU usage reaches to maximum just sometimes

java server application performance issues

Monitoring Cassandra CPU usage with JMX/MBeans

Java Multithread Application uses only one Core

How to take memory snapshots at regular interval for jrockit?

Categories

Resources