Can a Java program run without its file? - java

I'm fairly new at this stuff, but essentially: there are programs and there are processes. A program is a file that spawns a process, when executed.
You can't delete a program if there is a process still associated to it. The process needs to be killed first.
This seems to be the case for Java programs too. However I'm curious as to why - isn't the entire thing loaded into the JVM anyway?

"Deleted file" involves som OS-semantics. Under Unix/Linux a file may be deleted and all open file handles stay valid. When the last open file handle vanishes, the space occupied by the deleted file is returned to the pool of free space.
Under Windows there may be other mechanisms.

The JVM works as a Just-In-Time (JIT) compiler. There are many sources of information on JIT compilation, but basically as a java program is running it will encounter parts of the program that are needed, these pieces of the program are in .class files. These .class files is just an intermediate form of Java code (it's not quite Java code, but not quite machine code, yet). Obviously, compiling at runtime (JIT) takes resources (CPU cycles) and, thus, time. So, the JVM only loads pieces of the program that it needs to minimize wasted CPU cycles.
But yes, your understanding of process/programs is correct. To sum up: A process is a running instance of a program. This running program, then can spawn even more processes or threads to perform work.

Related

How is JVM instance created per application?

I understand that each java process runs in its own JVM. For example when I run jcmd in my machine, I see
21730 sun.tools.jcmd.JCmd
77558 /usr/local/opt/jenkins-lts/libexec/jenkins.war --httpListenAddress=127.0.0.1 --httpPort=8080
99974
99983 org.jetbrains.jps.cmdline.Launcher /Applications/IntelliJ IDEA.app/Contents/lib/asm-all-7.0.1.jar:/Applications/IntelliJ IDEA.app/Contents/lib/lz4-java-1.6.0.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/java/lib/aether-connector-basic-1.1.0.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/java/lib/plexus-utils-3.0.22.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/java/lib/aether-api-1.1.0.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/java/lib/javac2.jar:/Applications/IntelliJ IDEA.app/Contents/lib/util.jar:/Applications/IntelliJ IDEA.app/Contents/lib/platform-api.jar:/Applications/IntelliJ IDEA.app/Contents/lib/qdox-2.0-M10.jar:/Applications/IntelliJ IDEA.app/Contents/lib/jna.jar:/Applications/IntelliJ IDEA.app/Contents/lib/trove4j.jar:/Applications/IntelliJ IDEA.app/Contents/lib/nanoxml-2.2.3.jar:/Applications/IntelliJ IDEA.app/Contents/lib/jdom.jar:/Applications/IntelliJ IDEA.app/Contents/lib/netty-common-4.1.41.Final.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/java/lib/aet
How is the JVM created per app ? Like what happens when I start jenkins with java -jar jenkins.war. Does some process copy over JVM stuff from JRE folder and initialize an instance of JVM ?
When you start a program like java, the operating system creates a "process". A process is the representation of a live, running program. The process concept is what allows you to run several copies of a program at the same time. Each process has its own private memory space and system resources like open files or network connections. Each process can load a different set of dynamically linked libraries. With Java, much of the jvm is implemented in shared libraries, which the launcher program "java" loads in at run time.
The details are OS dependent and become complicated fast.
One of the things that happen when the process is started is that the executable file is mapped into memory. The CPU cannot execute instructions that are on disk or other external storage, so the program "text" has to be copied from disk into main memory first. Mapping the file into memory simplifies this and makes it more efficient: If the CPU needs to access a memory location that's not actually in RAM, the memory manager unit (MMU) issues a "page fault". The page fault causes data to be loaded into RAM. This is more efficient than simply copying the program text into RAM (what if not all text is needed all the time) and also simplifies the overall system (the virtual memory system is already needed for other OS features)

stop java execution if condition is met

So the idea is a kind of virtual classroom (a website) where students uploads uncompiled .java files, our server will compile it and execute it through C# or PHP, the language doesn't matter, creating a .bat file and get the feedback of the console if the program compiled correctly or not and if the execution was correct based on some pre-maded test, so far our tests did work but we have completely no control on what's inside the .java file so we want to stop the execution if some criterias did happen, i.e. an user input, infite loop, sockets instances, etc... I've digging on internet if there's a way to configure the java environment to avoid this but so far can't find anything, and we don't want our backend language to go through the file to check this things cause will be a completly mess up
Thanks for the help
You could configure a security manager, but it doesn't have a very good track record of stopping a determined attacker, and doesn't do resource limiting anyways.
You could load the untrusted code with a dedicated class loader that only sees white-listed classes.
Or you could use something like docker to isolate the process at the operating system level. This could also limit its cpu and memory consumption.
I'd probably combine these approaches, but some risk will remain in either case.
(Yes, I realize that is complex, but safely sandboxing arbitrary java code is a hard problem.)

JVM running on IDE takes more processing power than running on command line

I faced a very frustrating issue. I used an IDE to develop my application. I was monitoring performance through the execution times in IDE but unfortunately when I exported my classes to Jar files, it ran 7x slower on command line. I checked my JVM and made sure the same JVMs are linked to both executions. Allocated more heap memory to the commandline.
Later, I used a process performance monitoring tool and ran both applications (on CMD & on IDE) simultaneously. I noticed that both processes are identical on everything but for CPU power. The process running on IDE takes around 19%-23% of CPU usage while this on CMD took about 6%-11% CPU usage. This may explain why the run on IDE takes less time.
In order to make the application clear, below is the piece of code that taking most of the time
for (Call call: calls {
CallXCD callXCD = call.getCallXCD();
if(callXCD.isOnNet()){
System.out.println("OnNet");
}
else if (callXCD.isXNet()){
System.out.println("XNet");
}
else if (callXCD.isOthers()){
System.out.println("Others");
}
else if (callXCD.isIntra()){
System.out.println("Intra");
}
else {
System.out.println("Not Known");
}
}
CallXCD is an object containing several string variables and several methods like isOnNet and isXNet. These methods apply the compareTo() method on the strings of the object and return true or false according to this comparison.
I profiled this piece of code by printing the time each iteration take. In the IDE each iteration took about 0.007 milli second while in command line running the jar file, it took about 0.2 milli second. Because my code takes around 4 million iteration on this, the negative impact on performance is very significant
Why does this happen?. Is there any way to allocate more processing power to JVM like memory arguments.
(After a conversation in comments.)
It seems this is down to a difference in how console output is handled in your command line and in your IDE. These can behave very significantly differently in terms of auto-scrolling, size of buffer etc.
Generally, when benchmarking code it's good to isolate the code you're actually interested in away from any diagnostic output which can interfere with the results. If you absolutely have to include diagnostic output, writing it to a file (possibly on a ram disk) can help to reduce the differences between console implementations.
Maybe your IDE has different JVM options, for running your process? Larger heap size (-Xmx) could reduce garbage collection costs. If your application is memory-intensive & choking on an insufficent heap size, this could easily be the problem.
The IDE typically also pipes your process's standard input/ output. I don't know if that would be more efficient than standard output to console, but if you're printing/logging a lot it might be a factor to consider.
As Jon Skeet says, there's a big apparent difference between the I/O traces on left & right.
The third main thing an IDE does, in Debug mode, is attach to your process as a debugger. But this normally makes the process slower, not faster.
Summary: best check your IDE's JVM options & launch configuration. Maybe it's run with different parameters, in a different working directory, or with too small a heap size.
I have seen such effects when the IDE (eclipse) used a newer Java version than is used when starting from the command line. Also, it could be that when running from within eclipse you are using the server VM and on the command line the client JVM is used. This can either be set in the run configuration or in the wirkspace JVM settings ("Default VM arguments" when you click on "Edit JRE" in the "Installed JREs" preference page).
So, first make sure the same arguments are used. Add the arguments that are used when starting from within eclipse to your command line.

Fastest way to run Java jar file from Python?

Here's my issue. I have an existing .jar file that I must use in my program. The program, however, is written in Python.
Since my program is taking a long time to run (a named entity tagger on a large development corpus) I profiled it using cProfiler and lined profiled it using line_profiler. It seems that 92% of the time is spent on this task.
I am currently using the following code:
import subprocess as sub
sub.call(["java", "-jar", "-Xmx512m", "MyFile.jar",
featuresFileName, numIterations, featureCutOff])
I read somewhere about subprocess vs Popen and other bits and pieces, but couldn't find a good solution that does not require subprocess or os calls (of course, there may not be any).
I'd really appreciate some advice on the fastest way to run a .jar file from within a Python script. Note, however, that I cannot modify the Java code nor do I have access to speak to the developer of that code.
Alternatively, and I don't know if this will help or if I'm simply grasping at straws here, but perhaps there is a way to keep the process called in sub.call() above in the background, somehow keeping the JVM running (?) so that I can simply invoke the jar file. Maybe that can help reduce startup costs? BTW I am a total Java newbie (mostly C++,C#,Python experience) so my question could make no sense whatsoever - I apologize in advance...
You could try porting your Python to Jython, and then run it all natively in the same JVM (that may or may not work). That way you have effectively zero start up time, and the JVM has enough time to leverage its JIT over time to ideally give you better performance overall.
That indicates that most of the time is spent in this process. It may not be the startup time which is the problem. It may be what it does once it has started.
The only way around this I can think of is to run the process in the background, multiple time concurrently if that is an option. (concurrently rather than running one after another)
Try with "-client" option. It should reduce JVM startup time.
By analysing the manifest file of the jar file you can find out the class name of the jar file which is used. So then you could in principle write your own small java daemon which is listening for new arguments to arrive and calls the main() function of the appropriate class. But it is really worth the effort only if startup costs are the issue.

run out of system resource (execute many programs in a shell script)

I'm running a shell script on the university's server. In this shell script, I will execute java, c, c++, python and perl programs. Because every program will be executed many many times(I'm a teaching assistant and will test the students' programs with many different inputs). The server always gives me an error: "running out of system resource". I guess this is due to I do not release the resource.
I heard that running a program in the shell script one time will active one process. So I think maybe there are so many processes that the system recourse allocated for me has been run out.
Is there any way to figure this problem out?
I pose part of my shell code as following:
# maxconnect4 is the compiled c code
for ((i = 1; i <= 21; i++))
do
maxconnect4 input1.txt
done
Thanks
Zhong
Since you are automatically running students' programs then it may be that their programs are badly written and using more RAM than similar programs written by more skilled programmers would require. Even Java and Python programs can be written in such a way as to leak memory (think about a stack that never gets anything popped off of it, only more things pushed on).
You should test your setup with known good implementations of the assignments you are about to grade as a sanity check.
You should also look at the source code for the students' work. Especially if you get the error on their assignment.
You may also just have an overloaded system, and may need to run these tests on another machine. Using a machine that does not have other users is a good idea for this type of thing, since things outside of your and the program you are testing aren't likely to mess up your tests.
You may also want to keep top running on that machine on another terminal while you run the test to monitor resource usage.
You seem to be running maxconnect4, then waitng for it to finish before starting the next run, so I don't think your shell script itself is the isuue. The big question is what maxconnect4 is doing. It could be very hungry for resources, or it itself could start child processes and return to your script.
I would try a few experiments such as by hand start maxconnect4 a few times, do you se the resource error?
I would also use system tools to invetsigate. For example use ps to see whether there are lots of processes running. Use vmstat to look at CPU and memory usage.

Categories