I have some issues with my PC and don't know what else to check or look for.
If the tags or description are off, feel free to edit/comment.
Basically the question is: Do you know of any tool that I could run during reproduction of the test, which logs frequently and might provide a hint on what's going on?
If anyone already has a clue of what could cause the problem, that would be great as well.
So here's the problem:
I have a running JBoss 4.2.3.GA server application which provides some EJBs with remote interfaces. Those EJBs write or read stuff to/from the database but that doesn't seem to matter since I also had methods that just did a System.out.println(...) and nothing more.
Now I run a test client from the console which basically just "remotely" calls one of those EJB methods in a loop (to take some timings etc.).
So far nothing too unorthodox should be done, it's basically just a bunch of remote EJB calls.
However, during the execution of the loop the computer freezes completely (keyboard doesn't respond as well, e.g. num-lock key) - the only thing that changes is the blinking cursor. :)
Unfortunately I didn't manage to find a reason for this and since I often do my tests from eclipse I'd like to not have that happen too often (workspace crashes etc.)
Here's what I tried so far:
Numerous hardware tests including Lenovo PC Doctor (it's a Lenovo PC) - all succeeded, so i seems like there's no hardware problem
Use different JDK versions: 1.5.0, 1.6.0, 1.7.0 - all crashed
JRockit JVM (Java 6) - crashed as well
make Java cause 100% CPU load (10 thread running constantly on 4 cores) - succeeded/no error
allocate as much memory as the JVM would allow me - succeeded
run the tests on other computers - succeeded, except one that has the same hardware and similar software setup
Windows logs don't provide a hint (except "system was not shut down correctly" ... well that helps :) )
After all these tests I assume it might be a problem with the system configuration (drivers etc.) but I don't know how to track that (and I can't just use brute force due to the massive time requirements).
So, did anyone experience similar problems?
Do you know of any tool that I might use to log what the system does and preferably get a log right before the crash?
Thanks in advance,
Thomas
Related
I'm attempting to detect a bottleneck in our servers, and I'm having a hard time deciding where to start. The symptoms from the machine before it crashes are dropped connections (time outs, this is what happens when a client sees when a response takes too long, possibly indicating that a processor couldn't be allocated by the server side code, and the request couldn't be handled) and out of memory issues. The latter error code is actually given by the JVM in an error log, but I have a hard time believing that both the RAM and the lack of available CPUs are turning out to be the bottlenecks at the same time. (In the days of the previous crashes, it has consistently crashed in the manner described above.) We have our own in house server code, and I wouldn't classify it as analogous to Apache or any other code I've seen. (Sorry if this makes giving advice that much more difficult.)
I'd like to take some time to create a somewhat controlled test locally. I'm running the server, and I'm creating a program that will request different things from the local server. What's a good way to monitor RAM/CPU? I'm currently using Java's VisualVM, but monitors stop responding when I hammer it with some of the tests.
Any ideas out there would be greatly appreciated. Like I mentioned, I'm trying to grab as much useful data, to help me further troubleshoot. In general, when a bottleneck issue like this arises, what are some general strategies to take? The live servers are all running on Windows Server 2008. The version of Java is 7.03. My local box is running Windows 7, with Java 7.03 as well. I don't want to make too many assumptions, but I think it's reasonable to assume that Server 2008 and Windows 7 are pretty similar. (The OS architecture is the same.) Aside from that, my local box has identical hardware to our servers.
For Windows, you need to:
1) Establish a "performance baseline"
2) Try to reproduce the bottleneck, and compare the behavior under stress with the baseline
3) Identify the cause of the bottleneck, and correct it.
These links will help:
http://technet.microsoft.com/en-us/library/cc781394%28v=ws.10%29.aspx
http://msdn.microsoft.com/en-us/library/windows/hardware/gg463394.aspx
You should also look at these TCP/IP registry settings:
Increase the range of ephemeral ports
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, MaxUserPort
Consider disabling auto-tuning:
http://geekswithblogs.net/cajunmcse/archive/2010/08/20/windows-2008-exchange-and-tcpip-auto-tuning.aspx
I was just wondering if it's possible to dump a running Java program into a file, and later on restart it (same machine)
It's sounds a bit weird, but who knows
--- update -------
Yes, this is the hibernate feature for a process instead of a full system. But google 'hibernate jvm process' and you'll understand my pain.
There is a question for linux on this subject (here). Quickly, it's possible to hibernate a process (far from 100% reliable) with CryoPID.
A similar question was raised in stackoverflow some years ago.
With a JVM my educated guess is that hibernating should be a lot easier, not always possible and not reliable at 100% (e.g. UI and files).
Serializing a persistent state of the application is an option but it is not an answer to the question.
This may me a bit overkill but one thing you can do is run something like VirtualBox and halt/save the machine.
There is also:
- JavaFlow from Apache that should do just that even though I haven't personally tried
it.
- Brakes that may be exactly what you're looking for
There are a lot restrictions any solution to your problem will have: all external connections might or might not survive your attempt to freeze and awake them. Think of timeouts on the other side, or even stopped communication partners - anything from a web server to a database or even local files.
You are asking for a generic solution, without any internal knowledge of your program, that you would like to hibernate. What you can always do, is serialize that part of the state of your program, that you need to restart your program. It is, or at least was common wisdom to implement restart point in long running computations (think of days or weeks). So, when you hit a bug in your program after it run for a week, you could fix the bug and save some computation days.
The state of a program could be surprisingly small, compared to the complete memory size used.
You asked "if it's possible to dump a running Java program into a file, and later on restart it." - Yes it is, but I would not suggest a generic and automatic solution that has to handle your program as a black box, but I suggest that you externalize the important part of your programs state and program restart points.
Hope that helps - even if it's more complicated than what you might have hoped for.
I believe what the OP is asking is what the Smalltalk guys have been doing for decades - store the whole programming/execution environment in an image file, and work on it.
AFAIK there is no way to do the same thing in Java.
There has been some research in "persisting" the execution state of the JVM and then move it to another JVM and start it again. Saw something demonstrated once but don't remember which one. Don't think it has been standardized in the JVM specs though...
Found the presentation/demo I was thinking about, it was at OOPSLA 2005 that they were talking about squawk
Good luck!
Other links of interest:
Merpati
Aglets
M-JavaMPI
How about using SpringBatch framework?
As far as I understood from your question you need some reliable and resumable java task, if so, I believe that Spring Batch will do the magic, because you can split your task (job) to several steps while each step (and also the entire job) has its own execution context persisted to a storage you choose to work with.
In case of crash you can recover by analyzing previous run of specific job and resume it from exact point where the failure occurred.
You can also pause and restart your job programmatically if the job was configured as restartable and the ExecutionContext for this job already exists.
Good luck!
I believe :
1- the only generic way is to implement serialization.
2- a good way to restore a running system is OS virtualization
3- now you are asking something like single process serialization.
The problem are IOs.
Says your process uses a temporary file which gets deleted by the system after
'hybernation', but your program does not know it. You will have an IOException
somewhere.
So word is , if the program is not designed to be interrupted at random , it won't work.
Thats a risky and unmaintable solution so i believe only 1,2 make sense.
I guess IDE supports debugging in such a way. It is not impossible, though i don't know how. May be you will get details if you contact some eclipse or netbeans contributer.
First off you need to design your app to use the Memento pattern or any other pattern that allows you to save state of your application. Observer pattern may also be a possibility. Once your code is structured in a way that saving state is possible, you can use Java serialization to actually write out all the objects etc to a file rather than putting it in a DB.
Just by 2 cents.
What you want is impossible from the very nature of computer architecture.
Every Java program gets compiled into Java intermediate code and this code is then interpreted into into native platform code (when run). The native code is quite different from what you see in Java files, because it depends on underlining platform and JVM version. Every platform has different instruction set, memory management, driver system, etc... So imagine that you hibernated your program on Windows and then run it on Linux, Mac or any other device with JRE, such as mobile phone, car, card reader, etc... All hell would break loose.
You solution is to serialize every important object into files and then close the program gracefully. When "unhibernating", you deserialize these instances from these files and your program can continue. The number of "important" instances can be quite small, you only need to save the "business data", everything else can be reconstructed from these data. You can use Hibernate or any other ORM framework to automatize this serialization on top of a SQL database.
Probably Terracotta can this: http://www.terracotta.org
I am not sure but they are supporting server failures. If all servers stop, the process should saved to disk and wait I think.
Otherwise you should refactor your application to hold state explicitly. For example, if you implement something like runnable and make it Serializable, you will be able to save it.
I have a J2EE java application which processes SOAP requests. In our production environment (HPUX,OC4J,Java 5) we have about 20 threads running for this process, and we sometimes see 1 thread pausing for ~15 seconds. Until now, I haven't succeeded replicating the problem in our preproduction environment, and I'm scared of breaking stuff and violating SLA's if I use jconsole and associated tools on our production server.
Who has any inspiration? I know about http://java.sun.com/j2se/1.5/pdf/jdk50_ts_guide.pdf but I miss the experience to dare using it straight in production (plus, the HPUX guys threw some of these tools out of the toolbox, replacing them with HPJMeter)
Also, although this suggests a GC problem to me, I don't yet know enough to prove or disprove this theory and I am open to other suggestions.
We connect jconsole (and other tools) straight to production regularly. There is no significant overhead for us, the instrumentation is already going on within the JVM so you'd just be connecting a remote process to read published values. I say go for it!
Either way, you really need to see what's going on on the box. Thread dumps might or do some internal instrumentation. By internal instrumentation, I mean recording key measures within the code and exposing those somehow. It's essentially what the JVM does (exposing them via JMX) but rolling your own gives you more specificity. For example, I'm frequently recording request/response or other critical path performance timings internally.
oh, and one more thing. You can setup your app to using an agent to provide even more information. Typically this would be to plug a profiler in (like jprofiler or yourkit) but this does usually add more overhead and isn't recommended for production.
It's also worth thinking about the cost of not getting the information you need out of the VM. For example, is the cost of not fixing the issue more or less than the cost of a small % drop of performance when monitoring?
More scientifically, this article has some comments. It's suggesting up to 7% overhead (contradicting my previous point), a previous article from 2006 suggests 3-4% but both are highly contextual results. For example, CPU intensive applications may or may not be affected more than IO bound ones.
So a more appropriate answer from me (rather than just "go for it") would be to understand the impact it would have for your application in your environment through measurement. Run representative tests on a similar environment to production with jconsole connected and disconnected and see for your self.
Also see this stackoverflow question.
There are a few things that you can do on HP-UX to get additional information from a running Java process. If you send the PROF signal to the JVM, it will toggle the generation of a GC log (as if you had used the -Xverbosegc command line option). Generating the GC log is very inexpensive, so you should be able to turn this on in production without affecting the performance.
If you send the USR2 signal to the JVM, it starts profiling (same as -Xeprof). If you send the signal a second time, it turns off the profiling. This will have a noticeable performance impact, though it is smaller that what you would see from an external, third party profiler.
You can analyze the resulting data files using HPjmeter. HPjmeter can also connect to a running JVM for real-time monitoring. With Java 5, you need to start the JVM with the -agentlib option. If you were using Java 6, you could attach to the running JVM without needing any extra command line options.
I've developed a web application using the following tech stack:
Java
Mysql
Scala
Play Framework
DavMail integration (for calender and exchange server)
Javamail
Akka actors
On the first days, the application runs smoothly and without lags. But after 5 days or so, the application gets really slow! And now I have no clue how to profile this, since I have huge dependencies and it's hard to reproduce this kind of thing. I have looked into the memory and it seems that everything its okay.
Any pointers on the matter?
Try using VisualVM - you can monitor gc behaviour, memory usage, heap, threads, cpu usage etc. You can use it to connect to a remote VM.
`visualvm˙ is also a great tool for such purposes, you can connect to a remote JVM as well and see what's inside.
I suggest you doing this:
take a snapshot of the application running since few hours and since 5 days
compare thread counts
compare object counts, search for increasing numbers
see if your program spends more time in particular methods on the 5th day than on the 1str one
check for disk space, maybe you are running out of it
jconsole comes with the JDK and is an easy tool to spot bottlenecks. Connect it to your server, look into memory usage, GC times, take a look at how many threads are alive because it could be that the server creates many threads and they never exit.
I agree with tulskiy. On top of that you could also use JMeter if the investigations you will have made with jconsole are unconclusive.
The probable causes of the performances degradation are threads (that are created but never exit) and also memory leaks: if you allocate more and more memory, before having the OutOfMemoryError, you may encounter some performances degradation (happened to me a few weeks ago).
To eliminate your database you can monitor slow queries (and/or queries that are not using an index) using the slow query log
see: http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
I would hazard a guess that you have a missing index, and it has only become apparent as your data volumes have increased.
Yet another profiler is Yourkit.
It is commercial, but with trial period (two weeks).
Actually, I've firstly tried VisualVM as #axel22 suggested, but our remote server was ssh'ed and we had problems with connecting via VisualVM (not saying that it is impossible, I've just surrendered after a few hours).
You might just want to try the 'play status' command, which will list web app state (threads, jobs, etc). This might give you a hint on what's going on.
So guys, in this specific case, I was running play in Developer mode, which makes the compiler works every now and then.
After changing to production mode, everything was lightning fast and no more problems anymore. But thanks for all the help.
We have a curious problem with our java processes dying.
The application doesn't stacktrace, or write anything to the logs, the process just randomly dies. It's a heavily used application, but the problem only appears about once a month.
We're currently looking into using Process Monitor but any other suggestions would be welcome.
Edit:
It's a distributed Java application, running on Weblogic with an in-house web framework (Yes, this is a terrible idea, but it's been running for eight years), connecting to Oracle.
-
Out of Memory?
Our logs would catch java.lang.OutOfMemoryException, according to Brian Agnew.
Write crashes to a log? I don't think Java ever gets the chance, the death is happening at a process level, rather than Java exiting.
Can you wrap it in some shell script that captures the log files (stdout/stderr) and the exit code (which should give some indication as to how it died) ? On JVM exit you can also capture machine level stats using WMI
IF the VM itself is crashing it'll leave behind an hs_err_pid... file that contains stacktraces, machine-level debug info. You can then use that to diagnose the VM issue. See this blog entry for further information.
If the problem is related to the app's behaviour, it may be worth looking at JConsole, although from your description of the issue, this sounds much more like a low level VM issue.
(I assume you're on the latest VM for your Java version number etc.)
You can use a Linux NAGIOS Server to monitor the health of your Windows machines and services! Have a look at: nagios-monitoring-windows.
If you have such problems with your java app! You should test it and debug it! Applications shouldn't die without a trace! Look for logfiles! From which vendor is the app? Or is it self written? Try to enforce another Log4J/Logger/Debug Level. Monitor your System with cacti etc. to reduce the possibilities for such a crash. Talk to the software vendor.
Is enogh memory available? Maybe the app runs out of memory? Is it a standalone java process or a java process from a tomcat/jboss server?
Have you written down the crash times to a log? Appear they in different time-slices? Or appear they nearly time-circular?
VisualVM is a new tool which makes monitoring Java applications easier:
https://visualvm.dev.java.net/description.html
"VisualVM is a tool that provides detailed information about Java applications while they are running. It provides an intuitive graphical user interface that allows you to easily see information about multiple Java applications."