Separating Yourkit sessions - java

I have some segment of code I want to profile on many different inputs (~1000) so it doesn't make sense to manually run each test and save the results. I'm using yourkit in combination with Eclipse to profile. Is there any way to create "new sessions" for profiling? I want to be able to separate each run so that would make the most sense.

You don't really need to create "sessions" for each test. Instead, you have to capture a snapshot of the profiling data at the end of each test, and clear the profiling data before running the next test.
Using the yourkit API, you can do so in a manner similar to:
public void profile(String host, int port, List<InputData> inputDataSet) {
Map<InputData, String> pathMap = new HashMap<InputData, String>(); //If you want to save the location of each file
//Init profiling data collection
com.yourkit.api.Controller controller = new Controller(host, port);
controller.startCPUSampling(/*with your settings*/);
controller.startAllocationRecording(/*With your settings*/);
//controller.startXXX with whatever data you want to collect
for (InputData input: inputDataSet) {
//Run your test
runTest(inputData);
//Save profiling data
String path = controller.captureSnapshot(/*With or without memory dump*/);
pathMap.put(input, path);
//Clear yourkit profiling data
controller.clearAllocationData();
controller.clearCPUData();
//controller.clearXXX with whatever data you are collecting
}
}
I don't think you need to stop collecting, capture snapshot, clear data, restart collecting, you can just capture and clear data, but please double-check.
Once the tests are run, you can open the snapshots in yourkit and analyze the profiling data.

Unfotunately it's not clear how to run your tests. Does each test run in own JVM process or you run all tests in loop inside single JVM?
If you run each test in own JVM then you need 1) Run JVM with profiler agent, i.e. use -agentpath option (the details is here http://www.yourkit.com/docs/java/help/agent.jsp ). 2) Specify what you are profiling on JVM startup (agent option "sampling", "tracing", etc) 3) Capture snapshot file on JVM exit ("onexit" agent option).
Full list of options http://www.yourkit.com/docs/java/help/startup_options.jsp
If you run all tests inside single JVM you can use profiler API http://www.yourkit.com/docs/java/help/api.jsp to start profling before test starts and capture snapshot after test finishes. You need to use com.yourkit.api.Controller class.

Related

Databricks Spark notebook re-using Scala objects between runs?

I have written an Azure Databricks scala notebook (based on a JAR library), and I run it using a Databricks job once every hour.
In the code, I use the Application Insights Java SDK for log tracing, and init a GUID that marks the "RunId". I do this in a Scala 'object' constructor:
object AppInsightsTracer
{
TelemetryConfiguration.getActive().setInstrumentationKey("...");
val tracer = new TelemetryClient();
val properties = new java.util.HashMap[String, String]()
properties.put("RunId", java.util.UUID.randomUUID.toString);
def trackEvent(name: String)
{
tracer.trackEvent(name, properties, null)
}
}
The notebook itself simply calls the code in the JAR:
import com.mypackage._
Flow.go()
I expect to have a different "RunId" every hour. The weird behavior I am seeing is that for all runs, I get exactly the same "RunId" in the logs!
As if the Scala object constructor code is run exactly once, and is re-used between notebook runs...
Do Spark/Databricks notebooks retain context between runs? If so how can this be avoided?
A Jupyter notebook spawns a Spark session (think of it as a process) and keeps it alive until it either dies, or you restart it explicitly. The object is a singleton, so it's initialized once and will be the same for all cell executions of the notebook.
You start with a new context every time you refresh the notebook.
I would recommend saving your RunId to a file to disk, then reading that file on every notebook run and then increment the RunId in the file.

Jenkins - check Git before every builds

I am currently setting up a Continuous Integration system with Jenkins, and I came across a problem :
Almost every project depends on others projects. So, in order to perform daily builds, I use the CloudBees Build Flow plugin. It does its job pretty nicely actually, but not in an optimal way : It builds EVERY jobs I tell it to, without even checking on Git if there are any changes. So I would like to know if there are any ways to force Jenkins to check on Git if there are any changes before actually building the project.
PS : Sorry for my English, I am not a native speaker
Not sure, if you have looked at the configs in the job settings. There is a place to force a fresh checkout. I have svn linked, similar thing will be with git
If not you can looking for adding manual commands as shown below. Check to see if you can arrange the order of this to execute first then build your task
In the end, I chose to stick to BuildFlow and the Groovy language, instead of using scripts, but it's just by convenience, and this solution would totally work with shell language. Moreover, using BuildFlow allows you to use Parallel(), to start multiple jobs at the same time.
Here's my solution :
I found the plugin Jenkins Poll SCM, that polls the SCM before trying to build it (only if necessary).
The only problem with CloudBees Build Flow plugin is that it does not wait for previous jobs to be completed, as I am not using the build() method. To overcome this problem, I made my own buildWithPolling() method, that waits for the job to be done before going on. The only downside of my method is that it does not wait for downstream jobs to be finished (But I don't know if it does with the build() method either...). Here is the code of my method :
def buildWithPolling(project)
{
//Connection to the URL starting the polling, and starting the building if needed
def address = "http://localhost:8080/jenkins/job/" + project + "/poll"
println "Connexion à " + address + " pour scrutation du Git et build si besoin est."
def poll = new URL(address)
poll.openStream()
//Declaration of variables used to know if the build is still running, or if it is finished
boolean inProgress = true
def parser = new XmlParser()
def rootNode = null;
address = "http://localhost:8080/jenkins/job/" + project + "/lastBuild/api/xml?depth=1&tree=building&xpath=*/building"
while(inProgress) {
//A 5 seconds pause, because we don't need to overload the server
sleep(5000)
//Request sent to the server, to know if the job is finished.
def baos =new ByteArrayOutputStream()
baos << new URL(address).openStream()
rootNode = parser.parseText(new String(baos.toByteArray()))
inProgress = rootNode.text().toBoolean()
}
}
It is probaly not the best solution, but it's working for me !

From the AWS Java API, how can I tell when my EBS Snapshot has been created?

I have multiple EBS-backed EC2 instances running and I want to be able to take a snapshot of the EBS volume behind one of them, create a new EBS volume from that snapshot, and then mount that new EBS volume onto another as an additional drive. I know how to do this via the AWS web console, but I would like to automate the process by using the AWS Java API.
If I simply call the following commands one after another:
CreateSnapshotResult snapRes
= ec2.createSnapshot(new CreateSnapshotRequest(oldVolumeID, "Test snapshot"));
Snapshot snap = snapRes.getSnapshot();
CreateVolumeResult volRes
= ec2.createVolume(new CreateVolumeRequest(snap.getSnapshotId(), aZone));
String newVolumeID = volRes.getVolume().getVolumeId();
AttachVolumeResult attachRes
= ec2.attachVolume(new AttachVolumeRequest(newVolumeID, instanceID, "xvdg"));
I get the following error:
Caught Exception: Snapshot 'snap-8e822cfd' is not 'completed'.
Reponse Status Code: 400
Error Code: IncorrectState
Request ID: 40bc6bad-43e0-49e6-a89a-0489744d24e6
To get around this, I obviously need to wait until the snapshot is completed before I create the new EBS volume from the snapshot. According to the Amazon docs, the possible values of Snapshot.getState() are "pending, completed, or error," so I decided to check in with AWS to see if the snapshot is still in the "pending" state. I wrote the following code, but it has not worked:
CreateSnapshotResult snapRes
= ec2.createSnapshot(new CreateSnapshotRequest(oldVolumeID, "Test snapshot"));
Snapshot snap = snapRes.getSnapshot();
System.out.println("Snapshot request sent.");
System.out.println("Waiting for snapshot to be created");
String snapState = snap.getState();
System.out.println("snapState is " + snapState);
// Wait for the snapshot to be created
while (snapState.equals("pending"))
{
Thread.sleep(1000);
System.out.print(".");
snapState = snapRes.getSnapshot().getState();
}
System.out.println("Done.");
When I run this, I get the following output:
Snapshot request sent.
Waiting for snapshot to be created
snapState is pending
.............................................
Where the dots continue to be printed until I kill the program. In the AWS Web Console, I can see that the snapshot has been created (it now has a green circle marking it as "completed"), but somehow my program has not gotten the message.
When I replace the while loop with a simple wait for a second (insert the line Thread.sleep(1000) after Snapshot snap = snapRes.getSnapshot(); in the first code snippet), the program will often create a new EBS volume without complaint (it then dies when I try to attach the volume to the new instance). Sometimes, however, I will get the IncorrectState error even after waiting for a second. I assume this means that there is some variance in the amount of time it takes to create a snapshot (even of the same EBS volume), and that one second is enough to account for some but not all of the possible delay times.
I could just increase the hard-coded delay to something sure to be longer than the expected time, but that approach has many faults (it waits unnecessarily for most of the times I will use it, it is still not guaranteed to be long enough, and it won't translate well into a solution for the second step, mounting the EBS volume onto the instance).
I would really like to be able to check in with AWS at regular intervals, check to see if the state of the snapshot has changed, and then proceed once it has. What am I doing wrong and how should I fix my code to allow my program to dynamically determine when the snapshot has been fully created?
EDIT: I've tried using getProgress() rather than getState() as per the suggestion. My changed code looks like this:
String snapProgress = snap.getProgress();
System.out.println("snapProgress is " + snapProgress);
// Wait for the snapshot to be created
while (!snapProgress.equals("100%"))
{
Thread.sleep(1000);
System.out.print(".");
snapProgress = snapRes.getSnapshot().getProgress();
}
System.out.println("Done.");
I get the same output as I did when using getState(). I think my problem is that the snapshot object that my code references is not being updated correctly. Is there a better way to refresh/update that object than simply calling its methods repeatedly? My suspicion is that I'm running up against some sort of issue with the way that the API handles requests.
Solved it. I think the problem was that the Snapshot.getState() call doesn't actually make a new call to AWS, but keeps returning the state of the object at the time it was created (which would always be pending).
I fixed the problem by using the describeSnapshots() method:
String snapState = snap.getState();
System.out.println("snapState is " + snapState);
System.out.print("Waiting for snapshot to be created");
// Wait for the snapshot to be created
while (snapState.equals("pending"))
{
Thread.sleep(500);
System.out.print(".");
DescribeSnapshotsResult describeSnapRes
= ec2.describeSnapshots(new DescribeSnapshotsRequest().withSnapshotIds(snap.getSnapshotId()));
snapState = describeSnapRes.getSnapshots().get(0).getState();
}
System.out.println("\nDone.");
This makes a proper call to AWS every time, and it works.
Instead of getstate() , try using getProgress() method. If you are getting it blank then your EBS snapshot is not ready. It gives output in string percentage format ( 100% when your snapshot is ready). Hopefully it should do the trick. Let me know if it works.

Checking if multiple instances of a Java application are running (without preventing them)

I've seen plenty of examples of how to prevent Java applications, such as locking a file or creating a socket.
How do I allow multiple instances to run, but check if another one is running?
The reason I need to do this is that I want to clean a temporary folder on exit, but not if there is another process running.
You can use attach API and check for running virtual machines like explained in this blog entry
The other manner of acquiring a VirtualMachine is to ask for the list of virtual machines known to the system, and then pick the specific one you are interested in, typically by name:
String name = ...
List vms = VirtualMachine.list();
for (VirtualMachineDescriptor vmd: vms) {
if (vmd.displayName().equals(name)) {
VirtualMachine vm = VirtualMachine.attach(vmd.id());
String agent = ...
vm.loadAgent(agent);
// ...
}
}
You can use any IPC you like. For example, create /tmp/you_app_name_flag.$$ files ( $$ means pid ) and verify existance of /tmp/you_app_name_flag.* on your app's start.

Is there a way to get/hook/attach an already running process using java?

I want to be able to do something like that:
Process p = getRunningProcess(pid)
If there's a way, does it matter how the process was created (using java, using python, from the shell, etc...)?
It is possible to attach to another JVM process from Java app (e.g. to be able to monitor what's going on and potentially detect problems before they happen). You can do this by using the Attach API. Don't know much about attaching to non-JVM processes.
String name = ...
List vms = VirtualMachine.list();
for (VirtualMachineDescriptor vmd: vms) {
if (vmd.displayName().equals(name)) {
VirtualMachine vm = VirtualMachine.attach(vmd.id());
String agent = ...
vm.loadAgent(agent);
// ...
}
}
Yes there is a way to attach any non-JVM process with ProcessHandle.
Here a code Example that starts the calculator and closes it by using the pid.
Process calc = Runtime.getRuntime().exec("gnome-calculator");
Thread.sleep(2000);
long pid = calc.pid();
Optional<ProcessHandle> optionalProcessHandle = ProcessHandle.of(pid);
optionalProcessHandle.ifPresent(ProcessHandle::destroy);
But make sure to run Java SE/JDK 11 or higher and to import java.util.Optional;.
See the documentation to see further methods that can be used with ProcessHandle:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ProcessHandle.html
Credits to java.lang.ProcessHandle - compilation error for being a template for this.

Categories