I know this is an absolute shot in the dark, but we're absolutely perplexed.
A perl (5.8.6) script run by Java (1.5) is taking more than an hour to complete. The same script, when run manually from the command line takes 12 minutes to complete. This is on a Linux host.
Logging is the same in both cases and the script is run with the same parameters in both cases.
The script does some complex stuff like Oracle DB access, some scp's, etc, but again, it does the exact same actions in both cases.
We're stumped. Has anyone ever run into a similar situation? If not and if you were faced with the same situation, how would you consider debugging it?
Sub-proceses which produce console output can block (and deadlock) if their stdout/stderror streams are not flushed. #gustafc, the code posed will eventually block the sub-process when it tries to write to stdout/stderror, and there is no room in the stream (and the stream is not being serviced by java).
Process p = startProcess();
final InputStream stdout = p.getInputStream();
final InputStream sterr = p.getErrorStream();
new Thread() {
public void run() {
int c;
while ((c = sterr.read()) != -1) {
System.out.print((char)c);
}
}
}.start();
new Thread() {
public void run() {
int c;
while ((c = sterr.read()) != -1) {
System.out.print((char)c);
}
}
}.start();
I assume you've discarded the possibility that the Java wrapper happens to run simultaneously as something else which causes huge contention over some scarce resource? Good.
If you have a simple class like this:
public class Exec {
public static void main(String[] args) throws Throwable{
class Transfer implements Runnable {
private final InputStream in;
private final OutputStream out;
public Transfer(InputStream i, OutputStream o){
in = i;
out = o;
}
public void run(){
try {
for (int i; (i = in.read()) != -1;) out.write(i);
out.close();
in.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Process proc = new ProcessBuilder(args).start();
new Thread(new Transfer(System.in, proc.getOutputStream())).start();
new Thread(new Transfer(proc.getInputStream(), System.out)).start();
new Thread(new Transfer(proc.getErrorStream(), System.err)).start();
System.exit(proc.waitFor());
}
}
... and you compare time perl script.pl insert args here and time java Exec perl script.pl insert args here, what happens? If the world is sane, they take about the same time (except that the second one needs a few seconds extra for Java to start), and if that's the case, gradually start adapting the Exec class to look more and more like your deployment environment, and see when it starts taking a really long time.
If Exec above really does take longer time, start logging like crazy in the Perl script, so you see which actions take longer time. And btw, log in the Java wrapper, too, so you see if the Perl startup takes a really long time or something.
One possibility is that you are making the system thrash by trying to run a large Java app and a large Perl app on a system that doesn't have enough memory.
It would be a good idea to use monitoring utilities like top vmstat -5 iostat -5 etc to try and figure out if the slowness corresponds to some OS-level pathology.
To bring this thread to a close, the eventual cause was rogue processes consuming too much CPU. When launched from the command-line, the script had normal priority. When launched from Java, the script had low priority and thus took forever to execute. What threw us off was that the Java code was not just executing the script, it was issuing the same commands via SSH that we issued interactively. Thus, we didn't expect the difference in priority.
Related
I have been researching this issue pretty extensively and cannot seem to find an answer.
I know that the Only part of a ReadProcessMemory or WriteProcessMemory request was completed exception is thrown when a 32-bit process tries to access a 64-bit process and the same for a 64-bit modifying a 32-bit process.
The solution to that issue is to change the Platform Target to 'Any CPU'. I have tried this and unfortunately this does not solve my issue.
The next block of code is what keeps throwing the exception. The program that runs this code is used to open up applications on remote computers and keeps a list of all the processes that the program itself opened so that I don't have to loop through all the processes.
Process processToRemove = null;
lock (_runningProcesses)
{
foreach (Process p in _runningProcesses)
{
foreach (ProcessModule module in p.Modules)
{
string[] strs = text.Split('\\');
if (module.ModuleName.Equals(strs[strs.Length - 1]))
{
processToRemove = p;
break;
}
}
if (processToRemove != null)
{
break;
}
}
if (processToRemove != null)
{
processToRemove.Kill();
_runningProcesses.Remove(processToRemove);
}
}
These processes can and most likely will be 32-bit and 64-bit, mixed together.
Is there anything I am doing that I shouldn't be doing, or is there just a better way to do all of this?
As detailed in the comments of the MSDN page for Process.Modules and this thread there is a known issue in Process.Modules when enumerating 32 bit processes from a 64 bit process and visa-versa:
Internally .NET's Process.Modules is using function EnumProcessModules
from PSAPI.dll. This function has a known issue that it cannot work
across 32/64 bit process boundary. Therefore enumerating another
64-bit process from 32-bit process or vice versa doesn't work
correctly.
The solution seems to be to use the EnumProcessModulesEx function, (which must be called via P/Invoke), however this function is only available on later versions of Windows.
We fixed this issue by adding
a new function called EnumProcessModulesEx to PSAPI.dll
(http://msdn2.microsoft.com/en-us/library/ms682633.aspx), but we
currently cannot use it in this case:
it only works on Windows Vista or Windows Server 2008
currently .NET 2.0 Framework don't have a service pack or hotfix to make Process.Modules use this new API
There are only some issues regarding the handling of the processes and the locking that I would change:
object lockObject = new object();
List<Process> processesToRemove = new List<Process>();
foreach (Process p in _runningProcesses)
{
foreach (ProcessModule module in p.Modules)
{
string[] strs = text.Split('\\');
if (module.ModuleName.Equals(strs[strs.Length - 1]))
{
processesToRemove.Add(p);
break;
}
}
}
lock (lockObject)
{
foreach (Process p in processesToRemove)
{
p.Kill();
_runningProcesses.Remove(p);
}
}
I'm not answering for the bounty, just wanted to give some ideas. This code isn't tested because I don't exactly know what you are trying to do there.
Just consider not to lock the process-list and to keep the lock as short as possible.
I agree with #sprinter252 that _runningProcesses should not be used as your sync object here.
//Somewhere that is accessible to both the thread getting the process list and the thread the
//code below will be running, declare your sync, lock while adjusting _runningProcesses
public static readonly object Sync = new object();
IList<Process> runningProcesses;
lock(Sync)
{
runningProcesses = _runningProcesses.ToList();
}
Process processToRemove = null;
foreach (Process p in _runningProcesses)
{
foreach (ProcessModule module in p.Modules)
{
string[] strs = text.Split('\\');
if (module.ModuleName.Equals(strs[strs.Length - 1]))
{
processToRemove = p;
break;
}
}
if (processToRemove != null)
{
break;
}
}
if (processToRemove != null)
{
//If we've got a process that needs killing, re-lock on Sync so that we may
//safely modify the shared collection
lock(Sync)
{
processToRemove.Kill();
_runningProcesses.Remove(processToRemove);
}
}
If this code is wrapped in a loop to continue to check _runningProcesses for the process you wish to kill, consider changing processToRemove to processesToRemove and change it's type to a collection, iterate over that list in the bottom block after a check for a non-zero count and lock outside of that loop to decrease the overhead of obtaining and releasing locks per process to kill.
It seems that the same java jar might consume vastly different amount of processor resources when started manually by java -jar and when started via Runtime.getRuntime().exec(...).
When started by java -jar it consumes basically nothing, but when started by another program via Runtime the processor consumption goes into double digits and I don't understand the difference in behavior.
To reproduce the problem I have a class with a for-loop that does nothing and waits when Enter will be pressed.
public class Cycle {
public static void main(String[] args) throws IOException {
InputStreamReader isr = new InputStreamReader(System.in);
BufferedReader in = new BufferedReader(isr);
System.out.println("Entering cycle...");
for (char c = (char) in.read(); ; c = (char) in.read()) {
if (c == '\n') {
System.out.println("Enter was pressed. Exiting...");
break;
}
}
}
}
I pack it into a jar and use another java program to start it.
public class StartApp {
public static void main(String[] args) throws IOException {
Runtime.getRuntime().exec("java -jar <path to jar>");
}
}
And that's when magic starts happening. Processor consumption sky rockets as if there is an infinite loop like while(true) inside the Cycle class. I can fix that by adding Thread.sleep(100) inside for-loop and then processor consumption goes back to normal, but the question is not about that.
Can someone advise me what is conceptually different between running the jar manually and getting expected behavior in terms of consumption and running it via Runtime?
Below is diagram that shows what I'm trying to do : it is just 2 programs. One is a simple Child program that writes out integers every 2 seconds, line-by-line .
The other is a Parent program that monitors the log file ( just a very basic text file). If the log file doesn't get modified within 5 seconds, then it should restart the Child program (via a batch file ); then continue normally.
My code for the child class is here:
package fileiotestapplication;
import java.io.*;
import java.io.IOException;
import java.util.*;
public class WriterClass {
#SuppressWarnings("oracle.jdeveloper.java.insufficient-catch-block")
public WriterClass() {
super();
int[] content = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,};
String[] friends = {"bob",};
File file = new File("/C:/Java_Scratch/someFile.txt");
// if file does not exists, then create it
try {
if (!file.exists()) {
file.createNewFile();
}
for (int i = 0 ; i < content.length; i++)
{
PrintStream bw = new PrintStream( new FileOutputStream(file, true) );
System.out.println("testing " + i);
bw.println( String.valueOf(content[i]) );
bw.close();
Thread.sleep(2500);
}
System.out.println("Done");
} catch (IOException ioe) {
// TODO: Add catch code
ioe.printStackTrace();
}
catch (InterruptedException ioe) {
// TODO: Add catch code
ioe.printStackTrace();
}
//someIS.println(i);
System.out.println("This is OK");
}
public static void main(String[] args) {
WriterClass writerClass = new WriterClass();
}
}
The source code
And I linked here my current code for the Parent class.
What I'm now trying to do is add in some logic that catches when the child class stops writing output. What I'd like to do is count all the lines in the log file; and then compare them every 5 seconds, is this a good way (the alternative would be - to keep checking to see if the file got modified at all)?
EDIT: The suggestion below to use waitFor() indeed helps, though I'm still working out details : it is generally like :
try {
/* StackOverflow code */
for ( ; ; ) {
ProcessBuilder pb = new ProcessBuilder("TheBatchFile.bat");
pb.directory(new File("C://Java_Scratch_//Autonomic_Using_Batch//"));
Process p = pb.start();
p.waitFor();
}
/* end - StackOverflow code */
}
catch (IOException i) {
i.printStackTrace();
}
catch (InterruptedException i) {
i.printStackTrace();
}
This will get very slow as the file keeps growing in size. A simpler way would be to simply check the last modification time of the file. Assuming that the reason the child program might stop writing to the file is that the program terminates (rather than e.g. hanging in an infinite loop), it is probably better to directly monitor the child process itself rather than relying on observing the effects of the process. This is particularly convenient if the parent process can be responsible for starting the program in the first place.
This can be done with the ProcessBuilder and Process classes in Java 8. Copying from the documentation, you can start the process like this (if you only want to monitor whether it's running or not):
ProcessBuilder pb = new ProcessBuilder("TheBatchFile.bat", "Argument1", "Argument2");
pb.directory(new File("/path/to/working/dir"));
Process p = pb.start();
Then, you can simply call p.waitFor(); to wait for the process to terminate. Do this in a loop, and you have your automatic-restarting-of-child behavior.
You can use the directory watch service:
https://docs.oracle.com/javase/tutorial/essential/io/notification.html
You can configure a path or a file and register a watcher.
The watcher gets a notification every time a file is changed. You can store this timestamp of a notification for later use.
For details see my link above.
You may then use a Timer or a Thread to check last modification.
While your method of creating a text file, and using a batch script is feasible, there is a better way to approach it. This is a standard problem to approach with multitasking, and by creating a couple threads, it is not too difficult at all.
Using threads has several advantages over going externally "around" the system with batch files and multiple programs. For starters, these may include:
Keeping everything together makes the project much tidier, cleaner,
and marginally easier to distribute.
It is easier to implement. Sure threads may seem confusing if you have never used them, but they are the lesser evil in my opinion, then all the steps involved in going around them. As I hope to show below, implementing this problem with threads is not hard.
Improved performance, as the very expensive operations of file IO, and spawning the batch file are avoided. Threads also have improved performance over processes in most cases because they are easier to spawn, and multithreading sees performance improvements on a wider range of processors than multiprocessing by being less reliant on having several cores.
No sketchy overlap between when one program is reading the file, while the other is writing to it simultaneously. These kind of situations are best avoided when possible.
Maintains Java's impressive cross platform abilities, because you are not using batch which is not cross platform. This might not be important to you for this project, but you may come across something in the future with a similar problem, where this is more important, and so you will have practice implementing it.
You learn better by using threads the "right way" instead of
developing bad habits by using a more hacky approach. If this is a
learning project, you might as well learn it right.
I went ahead and coded up the approach that I would most likely use to solve the problem. My code has a child thread the counts every two seconds, and a parent thread that monitors the child, and restarts it if the child goes five seconds without counting. Let's examine my program to give you a good idea of how it is working.
First, here is the class for the parent:
public class Parent {
private Child child;
public Parent(){
child = new Child(this);
child.start();
}
public void report(int count){ //Starts a new watchdog timer
Watchdog restartTimer = new Watchdog(this, count);
restartTimer.start();
}
public void restartChild(int currentCount){
if (currentCount == child.getCount()){ //Check if the count has not changed
//If it hasn't
child.kill();
child.start();
}
}
public static void main(String[] args){
//Start up the parent function, it spawns the child
new Parent();
}
}
The main function in there can be put somewhere else if you want, but to start everything up, just instantiate a parent. The parent class has an instance of the child class, and it starts up the child thread. The child will report it's counting to the parent with the report method, which spawns a watchdog timer (more on that in a second) that will call restartChild after five seconds with the current count. RestartChild, restarts the child thread, if the count is still the same as the one provided.
Here is the class for the watchdog timer:
class Watchdog implements Runnable { //A timer that will run after five seconds
private Thread t;
private Parent parent;
private int initialCount;
public Watchdog(Parent parent, int count){ //make a timer with a count, and access to the parent
initialCount = count;
this.parent = parent;
}
public void run() { //Timers logic
try {
Thread.sleep(5000); // If you want to change the time requirement, modify it here
parent.restartChild(initialCount);
} catch (InterruptedException e) {
System.out.println("Error in watchdog thread");
}
}
public void start () // start the timer
{
if (t == null)
{
t = new Thread (this);
t.start ();
}
}
}
This watchdog timer is a thread that the parent will run with the start method. The parent sends itself as a parameter so that we can call the restartChild function of the parent.It stores the count, because when it runs after five seconds, restartChild will check if the count has changed.
And finally, here is the child class
public class Child implements Runnable{
private Thread t;
public int counter = 0;
private boolean running;
private Parent parent; // Record the parent function
public Child(Parent parent){
this.parent = parent;
}
private void initializeAll(){
counter = 0;
running = true;
}
public int getCount(){
return counter;
}
#Override
public void run() {
while((counter <= 100)&&(running)){
//The main logic for child
counter +=1;
System.out.println(counter);
parent.report(counter); // Report a new count every two seconds
try {
Thread.sleep(2000); // Wait two seconds
} catch (InterruptedException e) {
System.out.println("Thread Failed");
}
}
}
public void start(){ //Start the thread
initializeAll();
t = new Thread(this);
t.start();
}
public void kill(){ //Kill the thread
running = false;
}
}
This is also a thread, thus it implements runnable, and in that regard acts a lot like the watchdog. Run() is the main method of the child thread, this is where your logic goes that gets called when you start it. Starting the child with start() sets all the variables to their defaults, and then begins the run() logic. The logic in run is wrapped in if(running), because that lets us kill the thread internally by setting running to false.
Currently, all the child does right now is increment it's counter, output it to console, and then report the activity to the parent, 100 times, every two seconds. You will likely want to remove the condition stopping it after count passes 100, but I included it, so that the parent would eventual have cause to restart the child. To change the behavior, look at the child's run method, that is where all the main action is at.
I want to do a task that I've already completed except this time using multithreading. I have to read a lot of data from a file (line by line), grab some information from each line, and then add it to a Map. The file is over a million lines long so I thought it may benefit from multithreading.
I'm not sure about my approach here since I have never used multithreading in Java before.
I want to have the main method do the reading, and then giving the line that has been read to another thread which will format a String, and then give it to another thread to put into a map.
public static void main(String[] args)
{
//Some information read from file
BufferedReader br = null;
String line = '';
try {
br = new BufferedReader(new FileReader("somefile.txt"));
while((line = br.readLine()) != null) {
// Pass line to another task
}
// Here I want to get a total from B, but I'm not sure how to go about doing that
}
public class Parser extends Thread
{
private Mapper m1;
// Some reference to B
public Parse (Mapper m) {
m1 = m;
}
public parse (String s, int i) {
// Do some work on S
key = DoSomethingWithString(s);
m1.add(key, i);
}
}
public class Mapper extends Thread
{
private SortedMap<String, Integer> sm;
private String key;
private int value;
boolean hasNewItem;
public Mapper() {
sm = new TreeMap<String, Integer>;
hasNewItem = false;
}
public void add(String s, int i) {
hasNewItem = true;
key = s;
value = i;
}
public void run() {
while (!Thread.currentThread().isInterrupted()) {
try {
if (hasNewItem) {
// Find if street name exists in map
sm.put(key, value);
newEntry = false;
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
// I'm not sure how to give the Map back to main.
}
}
I'm not sure if I am taking the right approach. I also do not know how to terminate the Mapper thread and retrieve the map in the main. I will have multiple Mapper threads but I have only instantiated one in the code above.
I also just realized that my Parse class is not a thread, but only another class if it does not override the run() method so I am thinking that the Parse class should be some sort of queue.
And ideas? Thanks.
EDIT:
Thanks for all of the replies. It seems that since I/O will be the major bottleneck there would be little efficiency benefit from parallelizing this. However, for demonstration purpose, am I going on the right track? I'm still a bit bothered by not knowing how to use multithreading.
Why do you need multiple threads? You only have one disk and it can only go so fast. Multithreading it won't help in this case, almost certainly. And if it does, it will be very minimal from a user's perspective. Multithreading isn't your problem. Reading from a huge file is your bottle neck.
Frequently I/O will take much longer than the in-memory tasks. We refer to such work as I/O-bound. Parallelism may have a marginal improvement at best, and can actually make things worse.
You certainly don't need a different thread to put something into a map. Unless your parsing is unusually expensive, you don't need a different thread for it either.
If you had other threads for these tasks, they might spend most of their time sitting around waiting for the next line to be read.
Even parallelizing the I/O won't necessarily help, and may hurt. Even if your CPUs support parallel threads, your hard drive might not support parallel reads.
EDIT:
All of us who commented on this assumed the task was probably I/O-bound -- because that's frequently true. However, from the comments below, this case turned out to be an exception. A better answer would have included the fourth comment below:
Measure the time it takes to read all the lines in the file without processing them. Compare to the time it takes to both read and process them. That will give you a loose upper bound on how much time you could save. This may be decreased by a new cost for thread synchronization.
You may wish to read Amdahl's Law. Since the majority of your work is strictly serial (the IO) you will get negligible improvements by multi-threading the remainder. Certainly not worth the cost of creating watertight multi-threaded code.
Perhaps you should look for a new toy-example to parallelise.
Is there any way to reboot the JVM? As in don't actually exit, but close and reload all classes, and run main from the top?
Your best bet is probably to run the java interpreter within a loop, and just exit. For example:
#!/bin/sh
while true
do
java MainClass
done
If you want the ability to reboot or shutdown entirely, you could test the exit status:
#!/bin/sh
STATUS=0
while [ $STATUS -eq 0 ]
do
java MainClass
STATUS=$?
done
Within the java program, you can use System.exit(0) to indicate that you want to "reboot," and System.exit(1) to indicate that you want to stop and stay stopped.
IBM's JVM has a feature called "resettable" which allows you to effectively do what you are asking.
http://publib.boulder.ibm.com/infocenter/cicsts/v3r1/index.jsp?topic=/com.ibm.cics.ts31.doc/dfhpj/topics/dfhpje9.htm
Other than the IBM JVM, I don't think it is possible.
Not a real "reboot" but:
You can build your own class loader and load all your classes (except a bootstrap) with it. Then, when you want to "reboot", make sure you do the following:
End any threads that you've opened and are using your classes.
Dispose any Window / Dialog / Applet you've created (UI application).
Close / dispose any other GC root / OS resources hungry peered resource (database connections, etc).
Throw away your customized class loader, create another instance of it and reload all the classes. You can probably optimize this step by pre-processing the classes from files so you won't have to access the codebase again.
Call your main point of entry.
This procedure is used (to some extent) while "hot-swapping" webapps in web servers.
Note though, static class members and JVM "global" objects (ones that are accessed by a GC root that isn't under your control) will stay. For example, Locale.setLocale() effects a static member on Locale. Since the Locale class is loaded by the system class loader, it will not be "restarted". That means that the old Locale object that was used in Locale.setLocale() will be available afterward if not explicitly cleaned.
Yet another route to take is instrumentation of classes. However, since I know little of it, I'm hesitant to offer advice.
Explanation about hot deploy with some examples
If you're working in an application server, they typically come with built-in hot deployment mechanisms that'll reload all classes in your application (web app, enterprise app) when you redeploy it.
Otherwise, you'll have to look into commercial solutions. Java Rebel (http://www.zeroturnaround.com/javarebel/) is one such option.
AFAIK there is no such way.
Notice that if there were a way to do that, it would highly depend on the current loaded code to properly release all held resources in order to provide a graceful restart (think about files, socket/tcp/http/database connections, threads, etc).
Some applications, like Jboss AS, capture Ctrl+C on the console and provide a graceful shutdown, closing all resources, but this is application-specific code and not a JVM feature.
I do something similar using JMX, I will 'unload' a module using JMX and then 'reload' it. Behind the scenes I am sure they are using a different class loader.
Well, I currently have this, it works perfectly, and completely OS-independent. The only thing that must work: executing the java process without any path/etc, but I think this can also be fixed.
The little code pieces are all from stackoverflow except RunnableWithObject and restartMinecraft() :)
You need to call it like this:
restartMinecraft(getCommandLineArgs());
So what it basically does, is:
Spawns a new Process and stores it in the p variable
Makes two RunnableWithObject instances and fills the process object into their data value, then starts two threads, they just print the inputStream and errorStream when it has available data until the process is exited
Waits for the process to exit
prints debug message about process exit
Terminates with the exit value of the process(not necessary)
And yes it is directly pulled from my minecraft project:)
The code:
Tools.isProcessExited() method:
public static boolean isProcessExited(Process p) {
try {
p.exitValue();
} catch (IllegalThreadStateException e) {
return false;
}
return true;
}
Tools.restartMinecraft() method:
public static void restartMinecraft(String args) throws IOException, InterruptedException {
//Here you can do shutdown code etc
Process p = Runtime.getRuntime().exec(args);
RunnableWithObject<Process> inputStreamPrinter = new RunnableWithObject<Process>() {
#Override
public void run() {
// TODO Auto-generated method stub
while (!Tools.isProcessExited(data)) {
try {
while (data.getInputStream().available() > 0) {
System.out.print((char) data.getInputStream().read());
}
} catch (IOException e) {
}
}
}
};
RunnableWithObject<Process> errorStreamPrinter = new RunnableWithObject<Process>() {
#Override
public void run() {
// TODO Auto-generated method stub
while (!Tools.isProcessExited(data)) {
try {
while (data.getErrorStream().available() > 0) {
System.err.print((char) data.getErrorStream().read());
}
} catch (IOException e) {
}
}
}
};
inputStreamPrinter.data = p;
errorStreamPrinter.data = p;
new Thread(inputStreamPrinter).start();
new Thread(errorStreamPrinter).start();
p.waitFor();
System.out.println("Minecraft exited. (" + p.exitValue() + ")");
System.exit(p.exitValue());
}
Tools.getCommandLineArgs() method:
public static String getCommandLineArgs() {
String cmdline = "";
List<String> l = ManagementFactory.getRuntimeMXBean().getInputArguments();
cmdline += "java ";
for (int i = 0; i < l.size(); i++) {
cmdline += l.get(i) + " ";
}
cmdline += "-cp " + System.getProperty("java.class.path") + " " + System.getProperty("sun.java.command");
return cmdline;
}
Aaaaand finally the RunnableWithObject class:
package generic.minecraft.infinityclient;
public abstract class RunnableWithObject<T> implements Runnable {
public T data;
}
Good luck :)
It's easy in JavaX: You can use the standard functions nohupJavax() or restart().