Here's my issue. I have an existing .jar file that I must use in my program. The program, however, is written in Python.
Since my program is taking a long time to run (a named entity tagger on a large development corpus) I profiled it using cProfiler and lined profiled it using line_profiler. It seems that 92% of the time is spent on this task.
I am currently using the following code:
import subprocess as sub
sub.call(["java", "-jar", "-Xmx512m", "MyFile.jar",
featuresFileName, numIterations, featureCutOff])
I read somewhere about subprocess vs Popen and other bits and pieces, but couldn't find a good solution that does not require subprocess or os calls (of course, there may not be any).
I'd really appreciate some advice on the fastest way to run a .jar file from within a Python script. Note, however, that I cannot modify the Java code nor do I have access to speak to the developer of that code.
Alternatively, and I don't know if this will help or if I'm simply grasping at straws here, but perhaps there is a way to keep the process called in sub.call() above in the background, somehow keeping the JVM running (?) so that I can simply invoke the jar file. Maybe that can help reduce startup costs? BTW I am a total Java newbie (mostly C++,C#,Python experience) so my question could make no sense whatsoever - I apologize in advance...
You could try porting your Python to Jython, and then run it all natively in the same JVM (that may or may not work). That way you have effectively zero start up time, and the JVM has enough time to leverage its JIT over time to ideally give you better performance overall.
That indicates that most of the time is spent in this process. It may not be the startup time which is the problem. It may be what it does once it has started.
The only way around this I can think of is to run the process in the background, multiple time concurrently if that is an option. (concurrently rather than running one after another)
Try with "-client" option. It should reduce JVM startup time.
By analysing the manifest file of the jar file you can find out the class name of the jar file which is used. So then you could in principle write your own small java daemon which is listening for new arguments to arrive and calls the main() function of the appropriate class. But it is really worth the effort only if startup costs are the issue.
Related
So the idea is a kind of virtual classroom (a website) where students uploads uncompiled .java files, our server will compile it and execute it through C# or PHP, the language doesn't matter, creating a .bat file and get the feedback of the console if the program compiled correctly or not and if the execution was correct based on some pre-maded test, so far our tests did work but we have completely no control on what's inside the .java file so we want to stop the execution if some criterias did happen, i.e. an user input, infite loop, sockets instances, etc... I've digging on internet if there's a way to configure the java environment to avoid this but so far can't find anything, and we don't want our backend language to go through the file to check this things cause will be a completly mess up
Thanks for the help
You could configure a security manager, but it doesn't have a very good track record of stopping a determined attacker, and doesn't do resource limiting anyways.
You could load the untrusted code with a dedicated class loader that only sees white-listed classes.
Or you could use something like docker to isolate the process at the operating system level. This could also limit its cpu and memory consumption.
I'd probably combine these approaches, but some risk will remain in either case.
(Yes, I realize that is complex, but safely sandboxing arbitrary java code is a hard problem.)
I'm making an application in Java using Eclipse Indigo. When I run it using Eclipse the Task Manager shows javaw.exe is using 50mb of memory. When I export the application as a runnable .jar and execute the .jar the Task Manager shows javaw.exe is using 500mb.
Why is this? How could I fix this?
Edit: I'm using a Windows 7 64 bit, and my system says I have Java 1.7 installed. Apparently the memory problem is caused by a while loop. I'll study what's inside the while loop causing the problem.
Edit: Problem found. At one point in the while loop new BufferedImage instances where created, instead of replacing the same BufferedImage.
Without any additional details about your code, I would suggest using a profiler to analyze the problem. I know YourKit and the one that is available for NetBeans are very good.
Once you run you app from the profiler, you should initially look at the objects and listeners created by your application's packages. If the issue is not there, you can expand your search to other packages until you identify things that are growing out-of-control, and then look at the code that handles those entities.
When you run certain parts of the code multiple times and still see memory utilization after that code stopped running, then you might have a leak and may consider nulling or emptying variables/listeners on exit.
It should be a good starting point, but please report your results back, so we know how it goes. By the way, what operating system are you using and what version of java?
--Luiz
You need to profile your code to get the exact answer, but from my experience when I see similar things I often equate it to garbage collecting. For example, I ran the same job and gave one job 10 gigs and the other 2 gigs..Both ran and completed but the 10gigs one used more memory(and finished faster) while the second(2gig) one, I believe, garbage collected so it still completed but took a bit more time with less memory. I'm a bit new to java so I maybe assuming the garbage collecting but I have seen what you are talking about.
You need to profile your code(check out jconsole, which is included with java, or visualVM)..
That sounds most peculiar.
I can think of two possible explanations:
You looked at the wrong javaw.exe instance. Perhaps you looked at the instance that is running Eclipse ... which is likely to be that big, or bigger.
You have (somehow) managed to configure Java to run with a large heap by default. On Linux you could do this with a wrapper script, a shell function or a shell alias. You can do at least the first of those on Windows.
I don't think it is the JAR file itself. AFAIK, you can't set JVM parameters in a JAR file. It is possible that you've somehow included a different version of something in the JAR file, but that's a bit of a stretch ...
If none of these ideas help, try profiling.
Problem found. At one point in the while loop new BufferedImage instances where created, instead of replacing the same BufferedImage.
Ah yes. BufferedImage uses large amounts of out-of-heap memory and that needs to be managed carefully.
But this doesn't explain why your application used more memory when run from the JAR than when launched from Eclipse ... unless you were telling the application to do different things.
I was just wondering if it's possible to dump a running Java program into a file, and later on restart it (same machine)
It's sounds a bit weird, but who knows
--- update -------
Yes, this is the hibernate feature for a process instead of a full system. But google 'hibernate jvm process' and you'll understand my pain.
There is a question for linux on this subject (here). Quickly, it's possible to hibernate a process (far from 100% reliable) with CryoPID.
A similar question was raised in stackoverflow some years ago.
With a JVM my educated guess is that hibernating should be a lot easier, not always possible and not reliable at 100% (e.g. UI and files).
Serializing a persistent state of the application is an option but it is not an answer to the question.
This may me a bit overkill but one thing you can do is run something like VirtualBox and halt/save the machine.
There is also:
- JavaFlow from Apache that should do just that even though I haven't personally tried
it.
- Brakes that may be exactly what you're looking for
There are a lot restrictions any solution to your problem will have: all external connections might or might not survive your attempt to freeze and awake them. Think of timeouts on the other side, or even stopped communication partners - anything from a web server to a database or even local files.
You are asking for a generic solution, without any internal knowledge of your program, that you would like to hibernate. What you can always do, is serialize that part of the state of your program, that you need to restart your program. It is, or at least was common wisdom to implement restart point in long running computations (think of days or weeks). So, when you hit a bug in your program after it run for a week, you could fix the bug and save some computation days.
The state of a program could be surprisingly small, compared to the complete memory size used.
You asked "if it's possible to dump a running Java program into a file, and later on restart it." - Yes it is, but I would not suggest a generic and automatic solution that has to handle your program as a black box, but I suggest that you externalize the important part of your programs state and program restart points.
Hope that helps - even if it's more complicated than what you might have hoped for.
I believe what the OP is asking is what the Smalltalk guys have been doing for decades - store the whole programming/execution environment in an image file, and work on it.
AFAIK there is no way to do the same thing in Java.
There has been some research in "persisting" the execution state of the JVM and then move it to another JVM and start it again. Saw something demonstrated once but don't remember which one. Don't think it has been standardized in the JVM specs though...
Found the presentation/demo I was thinking about, it was at OOPSLA 2005 that they were talking about squawk
Good luck!
Other links of interest:
Merpati
Aglets
M-JavaMPI
How about using SpringBatch framework?
As far as I understood from your question you need some reliable and resumable java task, if so, I believe that Spring Batch will do the magic, because you can split your task (job) to several steps while each step (and also the entire job) has its own execution context persisted to a storage you choose to work with.
In case of crash you can recover by analyzing previous run of specific job and resume it from exact point where the failure occurred.
You can also pause and restart your job programmatically if the job was configured as restartable and the ExecutionContext for this job already exists.
Good luck!
I believe :
1- the only generic way is to implement serialization.
2- a good way to restore a running system is OS virtualization
3- now you are asking something like single process serialization.
The problem are IOs.
Says your process uses a temporary file which gets deleted by the system after
'hybernation', but your program does not know it. You will have an IOException
somewhere.
So word is , if the program is not designed to be interrupted at random , it won't work.
Thats a risky and unmaintable solution so i believe only 1,2 make sense.
I guess IDE supports debugging in such a way. It is not impossible, though i don't know how. May be you will get details if you contact some eclipse or netbeans contributer.
First off you need to design your app to use the Memento pattern or any other pattern that allows you to save state of your application. Observer pattern may also be a possibility. Once your code is structured in a way that saving state is possible, you can use Java serialization to actually write out all the objects etc to a file rather than putting it in a DB.
Just by 2 cents.
What you want is impossible from the very nature of computer architecture.
Every Java program gets compiled into Java intermediate code and this code is then interpreted into into native platform code (when run). The native code is quite different from what you see in Java files, because it depends on underlining platform and JVM version. Every platform has different instruction set, memory management, driver system, etc... So imagine that you hibernated your program on Windows and then run it on Linux, Mac or any other device with JRE, such as mobile phone, car, card reader, etc... All hell would break loose.
You solution is to serialize every important object into files and then close the program gracefully. When "unhibernating", you deserialize these instances from these files and your program can continue. The number of "important" instances can be quite small, you only need to save the "business data", everything else can be reconstructed from these data. You can use Hibernate or any other ORM framework to automatize this serialization on top of a SQL database.
Probably Terracotta can this: http://www.terracotta.org
I am not sure but they are supporting server failures. If all servers stop, the process should saved to disk and wait I think.
Otherwise you should refactor your application to hold state explicitly. For example, if you implement something like runnable and make it Serializable, you will be able to save it.
I'm running a shell script on the university's server. In this shell script, I will execute java, c, c++, python and perl programs. Because every program will be executed many many times(I'm a teaching assistant and will test the students' programs with many different inputs). The server always gives me an error: "running out of system resource". I guess this is due to I do not release the resource.
I heard that running a program in the shell script one time will active one process. So I think maybe there are so many processes that the system recourse allocated for me has been run out.
Is there any way to figure this problem out?
I pose part of my shell code as following:
# maxconnect4 is the compiled c code
for ((i = 1; i <= 21; i++))
do
maxconnect4 input1.txt
done
Thanks
Zhong
Since you are automatically running students' programs then it may be that their programs are badly written and using more RAM than similar programs written by more skilled programmers would require. Even Java and Python programs can be written in such a way as to leak memory (think about a stack that never gets anything popped off of it, only more things pushed on).
You should test your setup with known good implementations of the assignments you are about to grade as a sanity check.
You should also look at the source code for the students' work. Especially if you get the error on their assignment.
You may also just have an overloaded system, and may need to run these tests on another machine. Using a machine that does not have other users is a good idea for this type of thing, since things outside of your and the program you are testing aren't likely to mess up your tests.
You may also want to keep top running on that machine on another terminal while you run the test to monitor resource usage.
You seem to be running maxconnect4, then waitng for it to finish before starting the next run, so I don't think your shell script itself is the isuue. The big question is what maxconnect4 is doing. It could be very hungry for resources, or it itself could start child processes and return to your script.
I would try a few experiments such as by hand start maxconnect4 a few times, do you se the resource error?
I would also use system tools to invetsigate. For example use ps to see whether there are lots of processes running. Use vmstat to look at CPU and memory usage.
I assume the latest update version of java would provide better performance.
I am looking for a way to implement isolation of software components from endless loops or memory leaks. Android isolates each app in it's own process, Google Chrome isolates each tab in it's own process.
My primary drawback is that java takes so long to start and also I would like to reduce memory consumption.
Is there any alternate build or more controlled startup that will accomplish this?
If quick startup is your goal, Java on a PC may not be your best bet. It's going to take a few seconds because that's how long it takes to load the VM from disk.
If you want your app to start more quickly it's easy to get a splash screen up, just create a module that only loads your splash screen, waits for it to fully display then uses reflection to link to your "Real" main module.
(Use reflection because otherwise it will pull in your entire program through references before it starts the main one--at least that's how it used to work).
If you're talking about run-time performance, you won't get quicker by changing languages, Java's about as fast as you can get. You MIGHT be able to get a boost by converting to C/C++ and rewriting it to suit those platforms (Less OO, stack allocations instead of heap, etc), but otherwise none of the other languages in general usage are close to Java in speed.
If you really need the quick startup, depending on what you are doing there may be some tricks. I've seen projects that try to keep a Java VM running in your toolbar and allow you to make requests (tell it to start an app). This was faster but made additional requirements of the user (Loading this additional tool)
Another possibility--if you are constantly starting up/shutting down small tasks and that's the reason the startup bothers you then you can definitely speed it up by keeping it running invisibly. Just have your Java app open a socket and listen for commands then create a little .EXE or shell script that can start your program if it's not running or send commands to that socket if it is. This would completely eliminate startups after the first run.
In general, Java has a much longer startup time than other languages. If you are sticking with Java on a desktop app, a lot of stuff like startup time is determined by the JRE installed on the client's computer, which you can't control.
As to "endless memory leaks"... Java doesn't leak memory. If your program does, fix it.
This is a second answer because it's completely different and my other got too long :)
Try compiling it--I think GCC can compile it. This could almost completely eliminate your startup. I believe Jikes used to be a windows java compiler by IBM, but I don't know if it's still maintained.
Note that compiled code will probably run slower than JVM code for long-running apps.