Distributed Program Execution Manager - java

Given the information about machines in a cluster (IP address/machine name) and a program (Java language) to run, is there a software (manager) available which would execute this program and returns the output along with the runtimes on each of the machines?
Currently, I am using a shell script to do this, but I couldn't get time taken (in secs) to run the java program back. It would be good if there is some distributed program execution manager like the one I described above.

Instead of writing your own script, you could simply use something like tentakel or shmux to run your application parallel on multiple nodes . You can run tentakel as
tentakel 'time <your application name>'
to get the output and the time it takes for the application to run.

I like to use Hudson for stuff like that. It was originally written for performing software builds and tests, but is more generic than that. Basically a controller for managing jobs and executions along with a client to deploy on nodes. Hadoop is another option if you have flexibility to re-write your app for a specific distributed computing framework.

I don't understand your question very much. What "runtime" do you want to get back? What clustering solution are you using? For distributed communication in Java I would recommend JGroups. FOr distributed JVM check Terracotta.

Related

Java application monitoring over longer time periods?

Using tools like JConsole I can monitor a Java application real time. How can analyze the performance over a longer time period? Let's say over a day? Or week?
Are there simple tools like jConsole I can use?
There are options for the generic "monitor as much parameters as we can" approach:
Command line: jcmd <PID|main class> PerfCounter.print (ref) – you will then need to wrap your head around the names of the properties it outputs, schedule running this periodically, store the data somewhere, visualize it yourself.
A lot (all?) of this information is also exposed by JMX beans. You can then find them out (you can see them and what they export in JConsole for example), and using a command-line tool like e.g. jmxterm you can record the values and visualize them. Same procedure: schedule yourself, record, visualize yourself. It's not too user-friendly, so why I am mentioning this approach is that...
...people usually use a specialized monitoring system (think Graphite, Zabbix, Logstash/Kibana etc. — I am throwing these in just as keys for search together with "Java/JVM/JMX/JFR") that can collect information from Java processes through JMX and nicely present it. Periodic running, storing the timeseries data, visualization is solved by these systems.
JFR ("Java Flight Recorder") is a mechanism built into the JVM that allows to have continuous recording of many JVM + system metrics, dumps them into a file periodically, then you can visualize these with JMC ("Java Mission Control"). It is "cheaper" in the sense that you do not need to install/support a separate monitoring system, but is less accessible (unless paired with a monitoring system): you need to collect, download, process files.
In addition to these, there is jstat which is basically the same as jcmd ... PerfCounter.print, but mostly for memory-related metrics, it has the "run periodically" functionality built in and presents results slightly differently (one "recording" – 1 line).
I would say: if you need to do it once or occasionally, be it over a longer period, and need just a few parameters, like memory/number of threads/..., then target using jstat, jcmd PerfCounter.print; if you need more parameters, then JFR/JMC. If you need it as something that runs alongside your system, always collecting and present, available to people not having admin rights in the system where the JVM resides, then look into the monitoring systems and their integration with Java applications.

Has any performance advantage compiling projects in parallel with a multithread program instead of using a single thread program?

I have to create with Java a program capable of perform n builds for a certain android project. Currently, I’m using the class ProcessBuilder and Process in order to execute the command "./gradlew assembleRelease", which executes the task "assembleRelease" invoking gradle, thus allowing to build the project and outputs a signed APK.
As I’ve already said, the program needs to execute n builds (1.000-4.000) for a certain android project. So, my question is:
Is there any advantage (time performing) in creating a multithreading program, which distribute these builds among n threads (managed by a ExecutorService for instance), rather than doing all the builds in a single thread? And how can I know what is the optimal number of threads to run asynchronously?
Thanks
Instead of doing this yourself in Java, I would suggest looking at existing solutions.
If you do not have a build server, this would be a great tool to handle this as it knows all about running builds in parallel!
For thousands of jobs, I would recommend a scripting solution to create and maintain the jobs, as well as a version control repository to let the jobs get the sources from when changed (like github).
I have found Jenkins with the Job DSL plugin to work quite nice for this. See https://wiki.jenkins-ci.org/display/JENKINS/Job+DSL+Plugin

Should I be using a distributed system like Mesos?

I have a project which briefly is as follows: Create an application that can accept tasks written in Java that perform some kind of computation and run the tasks on multiple machines* (the tasks are separate and have no dependency on one another).
*the machines could be running different OSs (mainly Windows and Ubuntu)
My question is, should I be using a distributed system like Apache Mesos for this?
The first thing I looked into was Java P2P libraries/ frameworks and the only one I could find was JXTA (https://jxta.kenai.com/) which has been abandoned by Oracle.
Then I looked into Apache Mesos (http://mesos.apache.org/) which seems to me like a good fit, an underlying system that can run on multiple machines that allows it to share resources while processing tasks. I have spent a little while trying to get it running locally as an example however it seems slightly complicated and takes forever to get working.
If I should use Mesos, would I then have to develop a Framework for my project that takes all of my java tasks or are there existing solutions out there?
To test it on a small scale locally would you install it on your machine, set that to a master, create a VM, install it on that and make that a slave, somehow routing your slave to that master? The documentation and examples don't show how to exactly hook up a slave on the network to a master.
Thanks in advance, any help or suggestions would be appreciated.
You can definitely use Mesos for the task that you have described. You do not need to develop a framework from scratch, instead you can use a scheduler like Marathon in case you have long running tasks, or Chronos for one-off or recurring tasks.
For a real-life setup you definitely would want to have more than one machine, but you might as well just run everything (Mesos master, Mesos slave, and the frameworks) off of a single machine if you're only interested in experimenting. Examples section of Mesos Getting Started Guide demonstrates how to do that.

Execute Commands / Run Programs

We've been working on a quite specific coding project recently. What we want to do is:
Use Java applications to do tasks impossible (or at least very diffucult) to accomplish in PHP
Control those Java programs with Joomla 3.0
We've found out that there is support for PHP Scripts in Joomla by using this extension or we could create our own module by using this.
My question is: Is there a way to call programs / execute commands in a more practical manner than using the PHP functions shell_exec() or exec() or using popen()?
Especially since these Java programs will run under a different user (on a Windows Server ...).
Thanks in advance!
Do not use such components. This is dangerous no matter what creator says. I'm Joomla extension developer and believe me it can ruin your application and make more problems and benefits. Depends on what you want to archive and how big will be your project you have few possibilities:
1. Create component that will execute commands
Something similar to what U did but based on custom created component. Its fastest and cheapest way. Problem starts when your Java application will use more resources then website (interface). So its more like good solution for start.
2. Create component that will contact application written in Java via API
This is good solution if your Java application use a lot of resources. You can run it on several servers, manage servers load so clients gets results faster etc. This gives you many possibilities, flexibility but is
harder to implement and will cost more.
3. Just use applet running on clients computer (if your application allows it)
Simple, effective, costs less but also can be impossible depending on what tasks application have to run.

How to profile a distributed app in java?

I've got an app running on a grid of uniform java processes (potentially on different physical machines). I'd like to collect cpu usage statistics from a single run of this app. I've went over profiling tools looking for an option of automatic collection of data but failed to find any in netbeans, tptp, jvisualvm, yourkit etc.
Maybe I'm looking in a wrong way?
What I was thinking is:
run the processes on the grid with some special setup that allows them to dump profiling info
run my app as usual - it will push tasks to the grid, the processes will execute the tasks and publish profiling info
uses some tool to collect and analyze the profiling results
but I can't find anything even remotely similar to this.
Any thoughts, experience, suggestions?
Thank you!
If you have allowed remote JMX access and if you are using SUN JDK 1.6 then try using jvisualvm. It has the option of remote JMX connection. Though I haven't it used for profiling CPU in a distributed environment.
Note: For CPU profiling your application should be running on SUN JDK 1.6 or above.
Have a look at these links:
JVisualVM
JVisualVM - Working with Remote Applications
Get heap dump from a remote application in Java using JVisualVM
Unable to profile JBoss 5 using jvisualvm
http://www.taranfx.com/java-visualvm
I have used CA Introscope for this type of monitoring. It uses Instrumentation to collect metrics over time. As an example, it can be configured to provide you a view of all nodes and their performance over time. From that node view, you can drill down to the method level to help you figure out where your bottle necks are.
Yes, it will provide CPU utilization.
It's a commercial $$$ tool, but its a great tool for collecting, monitoring and interrogating performance data.
if you look at something like zabbix (though there are tons others of monitoring tools), this allows for gathering data via JMX from a Java app. And if you enable JMX in your app and allow it to be queried externally (via TCP/IP) you will have access to a lot of the hotspot internals (free memory etc) also thread stacks etc. Then you could have these values graphed as well. It does need configuration but what you're looking for don't think can be done with a one line of a script.
Just to add that profiling information on each node usually contain timestamps.
To match these timestamps all machines should have exactly the same time (10 millis delta maximum)
cluster nodes should synchronize with single source network time server (NTP)
You can use some JMX library, e.g. jmxterm and wrap it in some code to connect to multiple hosts an poll them for changes. If you are abit familiar with Python, look at mys simple script here for some inspiration: http://rostislav-matl.blogspot.com/2011/02/monitoring-tomcat-with-jmxterm.html .
http://www.hyperic.com/products/open-source-systems-monitoring
I never tried other tools mentioned in other answers. I was more than satisfied with hyperic.
It exposes webservices API as well which you can use to write your own analysis tools.
If you know the critical paths you want to analyse I would suggest time stamping your process in key places and combining the logs yourself. This is likely to be a useful addition to your profiling, can be used in production and may be even more useful as a result. (It is for my project)
I have used YourKit to monitor a number of processes at once. It can show you what is happening in each in real time and collect the results when all is finished.
I don't know if it provides a combined view of what is happening.
I was looking for something similar and found Hyperic
Claims are the tool can monitor most common applications ans systems, gather all information and present them in a conveniant fashion.
To be honest this is on my todo list, so I can't say if it will do the job or not. Anyway, it seem impressive.

Categories