Profiling JVMs with JVMTI, how to distinguish the different JVMs? - java

I am writing a profiler with the aid of the JVM TI.
In C++ I have written a simple agent, which writes the information collected to a socket. With Java Swing I have built a simple GUI which reads these data from a socket to visualize it.
However I am facing some usability issues. I would like to provide the functionality to start profiling a Java application on request. There is the Attach API which provides the possibility to inject an agent into a running JVM.
But to start a new Java program and inject the agent is a little bit more complicated. One way would be, to make a call to the command line and start the Java program from the GUI Profiler:
java -agentlib:agent Program
I kind of dislike this idea, because it is somehow hacky but I see no other way, do you?
To summarize I need two ways to start profiling a JVM:
Start a Java applicatiom from the scratch and start profiling it directly
Attach to a running JVM and inject the agent to start profiling it
Further, I would be in need to distinguish the different JVMs which I inspect, but how to do that? There no unique identifier for the different JVMs. The Attach API gives the possibility to list the different JVMs with their name and id, but what to do in the first case? Is it possible to inject the agent with arguments?

You can also generate your own GUID in the Agent_OnLoad and use that for logging. this way if your some of your processes have short lives and others long lives you can distinguish between recycled PIDS.

I solved the problem by using the local process identification (pid) and the network address to uniquely identfy the JVM.

Related

Limiting memory usage of a Java/Rhino/Nashorn object

I'm extending a server application written in Java to allow user-defined callbacks (written in Javascript) to be run in response to requests. I've done some reading, and while it seems possible to disable Java classes in Nashorn, there is nothing stopping a user from creating Javascript code that allocates an enormous array without using any Java APIs. I'm wondering if there is any way to restrict this, either proactively or reactively.
The solution I came up with is to have a process pool of JVMs with small max heap sizes, which are responsible for running the user-defined code. There will be a worker pool manager to spawn new processes when needed. This way, the main process, as well as other user-defined code, will not be affected by a single malicious user. While this solution will likely work, it seems heavy-handed. Is there no better solution for preventing malicious users from using too much memory?
I'm not particularly set on Javascript, so if there exists any other scripting language that can be run within a JVM and also has support for memory usage limits, I would be open to using it instead of Nashorn. Unfortunately, it seems like Jython, JRuby, and LuaJava all don't have what I'm looking for. Thanks in advance.

Debugging A Java Agent

I'm currently developing a Java Agent in order to facilitate the dynamic instrumentation of new and legacy Java Applications.
It occurred to me that, as far as IDE debugging is concerned, Java Agents could be perhaps considered a special case as they are required to be injected into a target JVM process in order to be ran. Which thus naturally gives rise to the question of how one would go about debugging, testing and profiling an Agent-type application.
A cursory search for existing solutions turned up a few command line based options (i.e YourKit, JIP, etc) however many of them are ALSO Java Agents under the hood. Which if utilized would lead to, at least in my view, the rather odd scenario of an Agent debugging/profiling another Agent. I am aware that Agents can be stacked in a hierarchical arrangement, however i'm unsure if Agent Applications can be debugged by stacking Agents in this manor.
As stated in Java How To ... The -javaagent: Option:
An agent is just an interceptor in front of your main method, executed
in the same JVM and loaded by the same system classloader, and
governed by the same security policy and context.
The name is misleading, since the word agent usually suggests
something working remotely and separately from the primary entity. But
it turns out the java agent as used in -javaagent: is much simpler
than that.
One java application may have any number of agents by using
-javaagent: option any number of times. Agents are invoked in the same order as specified in options.
Each agent may also take String-valued args. I guess that's the reason
why we have to use this option multiple times for multiple agents.
Otherwise, we could've just done something like:
-javaagent agent1.jar:agent2.jar
, which is incorrect.
So, by putting the profiler agent (e.g. YourKit, JIP, etc.) before your own agent will give the debugging control to you.

What should go in a post-mortem debug information snap-shot for a Java program?

We have a situation where we would like to be able to create a zip file containing as much information as possible about a currently running Java program (which may be on its way down) to allow for post-mortem forensic analysis. We currently deploy to Java 5 but Java 6 features are interesting too.
So far I've thought of:
A programmatically generated thread dump. This appears to work better in Java 6.
The logged log events for the last X minutes. We currently use logback or java.util.logging.
Some serialized objects.
External environment - all system properties.
What else would be useful of JVM information?
Would it be possible in a generic way to walk the call stacks and see the arguments? (or does this require JVMTI or equivalent). It is a IBM JVM so we cannot use jvisualvm and the Attach API.
You could go all the way and capture a full heap dump? I realize you are on an IBM JVM, but this page seems to indicate there is a way.

How to profile a distributed app in java?

I've got an app running on a grid of uniform java processes (potentially on different physical machines). I'd like to collect cpu usage statistics from a single run of this app. I've went over profiling tools looking for an option of automatic collection of data but failed to find any in netbeans, tptp, jvisualvm, yourkit etc.
Maybe I'm looking in a wrong way?
What I was thinking is:
run the processes on the grid with some special setup that allows them to dump profiling info
run my app as usual - it will push tasks to the grid, the processes will execute the tasks and publish profiling info
uses some tool to collect and analyze the profiling results
but I can't find anything even remotely similar to this.
Any thoughts, experience, suggestions?
Thank you!
If you have allowed remote JMX access and if you are using SUN JDK 1.6 then try using jvisualvm. It has the option of remote JMX connection. Though I haven't it used for profiling CPU in a distributed environment.
Note: For CPU profiling your application should be running on SUN JDK 1.6 or above.
Have a look at these links:
JVisualVM
JVisualVM - Working with Remote Applications
Get heap dump from a remote application in Java using JVisualVM
Unable to profile JBoss 5 using jvisualvm
http://www.taranfx.com/java-visualvm
I have used CA Introscope for this type of monitoring. It uses Instrumentation to collect metrics over time. As an example, it can be configured to provide you a view of all nodes and their performance over time. From that node view, you can drill down to the method level to help you figure out where your bottle necks are.
Yes, it will provide CPU utilization.
It's a commercial $$$ tool, but its a great tool for collecting, monitoring and interrogating performance data.
if you look at something like zabbix (though there are tons others of monitoring tools), this allows for gathering data via JMX from a Java app. And if you enable JMX in your app and allow it to be queried externally (via TCP/IP) you will have access to a lot of the hotspot internals (free memory etc) also thread stacks etc. Then you could have these values graphed as well. It does need configuration but what you're looking for don't think can be done with a one line of a script.
Just to add that profiling information on each node usually contain timestamps.
To match these timestamps all machines should have exactly the same time (10 millis delta maximum)
cluster nodes should synchronize with single source network time server (NTP)
You can use some JMX library, e.g. jmxterm and wrap it in some code to connect to multiple hosts an poll them for changes. If you are abit familiar with Python, look at mys simple script here for some inspiration: http://rostislav-matl.blogspot.com/2011/02/monitoring-tomcat-with-jmxterm.html .
http://www.hyperic.com/products/open-source-systems-monitoring
I never tried other tools mentioned in other answers. I was more than satisfied with hyperic.
It exposes webservices API as well which you can use to write your own analysis tools.
If you know the critical paths you want to analyse I would suggest time stamping your process in key places and combining the logs yourself. This is likely to be a useful addition to your profiling, can be used in production and may be even more useful as a result. (It is for my project)
I have used YourKit to monitor a number of processes at once. It can show you what is happening in each in real time and collect the results when all is finished.
I don't know if it provides a combined view of what is happening.
I was looking for something similar and found Hyperic
Claims are the tool can monitor most common applications ans systems, gather all information and present them in a conveniant fashion.
To be honest this is on my todo list, so I can't say if it will do the job or not. Anyway, it seem impressive.

Is it possible to read the memory of a running java application?

I'd like to learn how, or if its possible at all to programmaticly interact with a black-box java application(by reading its data). Has there been any previous research/work on doing this sort of thing?
I'd imagine that running on a JVM significantly complicates things.
#anon: Doing this with any JVM is relevant. Do you have to know or control the specifics of how the JVM allocates memory to extract data from a java application?
You could look into java.lang.instrument. As long as you understand the class structure of the application, it will let you modify the methods in an already-running JVM and you may be able to concoct a way that allows you to extract or insert data enough to communicate (depends on the methods available, of course).
The Sable group at McGill University has contributed a lot of research to the Java world.
Much of the work is getting rather dated, but you might find some help in their EVolve project which has the goal of visualizing object-oriented programs. Some of their projects appear to be actively maintained (such as Soot, their Java optimization framework), so you might find luck contacting them directly
It is easily possible with, for example, StackTrace. It can attach to a java process and let you inspect and change almost everything with BeanShell.
I believe what you're looking for is what the Eclipse MAT does. You might want to take a look at the source code...
The HotSpot JVM allows you to hook up an agentlib from a profiler (see Open Source Java Profilers or commercials like Your Kit), in the profiler you can then inspect the memory/cpu/threads etc.
If you want very specific stuff you might want to make your own agentlib that sends you information about the jvm that you need.

Categories