How to reliably detect disk is down on Linux via Java

How to reliably detect disk is down on Linux via Java - java

Is there a good way to detect that particular disk went offline on server on Linux, via Java?
I have an application, that due to performance reasons, writes to all disks directly (without any RAID in middle).
I need to detect if Linux would unmount any disk due to disk crash during run-time, so I would stop using it. The problem is that each mount has a root directory, so without proper detection, the application will just fill-up the root partition.
Will appreciate any advice on this.

In Linux, everything is accessible through text files. I don't really understand what is the exact information you require, but check /proc/diskstat, /proc/mounts, /proc/mdstat (for raids), etc...

As anyone with sysadmin experience could tell you, disks crashing or otherwise going away has a nasty habit of making any process that touches anything under the mountpoint wait in uninterruptible sleep. Additionally, in my experience, this can include things like trying to read /proc/mounts, or running the 'df' command.
My recommendation would be to use RAID, and if necessary, invest your way out of the problem. Say, if performance is limited by small random writes, a RAID card with a battery backed write cache can do wonders.

Related

Dumping a Java program into a file and restarting it

I was just wondering if it's possible to dump a running Java program into a file, and later on restart it (same machine)
It's sounds a bit weird, but who knows
--- update -------
Yes, this is the hibernate feature for a process instead of a full system. But google 'hibernate jvm process' and you'll understand my pain.
There is a question for linux on this subject (here). Quickly, it's possible to hibernate a process (far from 100% reliable) with CryoPID.
A similar question was raised in stackoverflow some years ago.
With a JVM my educated guess is that hibernating should be a lot easier, not always possible and not reliable at 100% (e.g. UI and files).
Serializing a persistent state of the application is an option but it is not an answer to the question.

This may me a bit overkill but one thing you can do is run something like VirtualBox and halt/save the machine.
There is also:
- JavaFlow from Apache that should do just that even though I haven't personally tried
it.
- Brakes that may be exactly what you're looking for

There are a lot restrictions any solution to your problem will have: all external connections might or might not survive your attempt to freeze and awake them. Think of timeouts on the other side, or even stopped communication partners - anything from a web server to a database or even local files.
You are asking for a generic solution, without any internal knowledge of your program, that you would like to hibernate. What you can always do, is serialize that part of the state of your program, that you need to restart your program. It is, or at least was common wisdom to implement restart point in long running computations (think of days or weeks). So, when you hit a bug in your program after it run for a week, you could fix the bug and save some computation days.
The state of a program could be surprisingly small, compared to the complete memory size used.
You asked "if it's possible to dump a running Java program into a file, and later on restart it." - Yes it is, but I would not suggest a generic and automatic solution that has to handle your program as a black box, but I suggest that you externalize the important part of your programs state and program restart points.
Hope that helps - even if it's more complicated than what you might have hoped for.

I believe what the OP is asking is what the Smalltalk guys have been doing for decades - store the whole programming/execution environment in an image file, and work on it.
AFAIK there is no way to do the same thing in Java.

There has been some research in "persisting" the execution state of the JVM and then move it to another JVM and start it again. Saw something demonstrated once but don't remember which one. Don't think it has been standardized in the JVM specs though...
Found the presentation/demo I was thinking about, it was at OOPSLA 2005 that they were talking about squawk
Good luck!
Other links of interest:
Merpati
Aglets
M-JavaMPI

How about using SpringBatch framework?
As far as I understood from your question you need some reliable and resumable java task, if so, I believe that Spring Batch will do the magic, because you can split your task (job) to several steps while each step (and also the entire job) has its own execution context persisted to a storage you choose to work with.
In case of crash you can recover by analyzing previous run of specific job and resume it from exact point where the failure occurred.
You can also pause and restart your job programmatically if the job was configured as restartable and the ExecutionContext for this job already exists.
Good luck!

I believe :
1- the only generic way is to implement serialization.
2- a good way to restore a running system is OS virtualization
3- now you are asking something like single process serialization.
The problem are IOs.
Says your process uses a temporary file which gets deleted by the system after
'hybernation', but your program does not know it. You will have an IOException
somewhere.
So word is , if the program is not designed to be interrupted at random , it won't work.
Thats a risky and unmaintable solution so i believe only 1,2 make sense.

I guess IDE supports debugging in such a way. It is not impossible, though i don't know how. May be you will get details if you contact some eclipse or netbeans contributer.

First off you need to design your app to use the Memento pattern or any other pattern that allows you to save state of your application. Observer pattern may also be a possibility. Once your code is structured in a way that saving state is possible, you can use Java serialization to actually write out all the objects etc to a file rather than putting it in a DB.
Just by 2 cents.

What you want is impossible from the very nature of computer architecture.
Every Java program gets compiled into Java intermediate code and this code is then interpreted into into native platform code (when run). The native code is quite different from what you see in Java files, because it depends on underlining platform and JVM version. Every platform has different instruction set, memory management, driver system, etc... So imagine that you hibernated your program on Windows and then run it on Linux, Mac or any other device with JRE, such as mobile phone, car, card reader, etc... All hell would break loose.
You solution is to serialize every important object into files and then close the program gracefully. When "unhibernating", you deserialize these instances from these files and your program can continue. The number of "important" instances can be quite small, you only need to save the "business data", everything else can be reconstructed from these data. You can use Hibernate or any other ORM framework to automatize this serialization on top of a SQL database.

Probably Terracotta can this: http://www.terracotta.org
I am not sure but they are supporting server failures. If all servers stop, the process should saved to disk and wait I think.
Otherwise you should refactor your application to hold state explicitly. For example, if you implement something like runnable and make it Serializable, you will be able to save it.

How to prohibit Java VM from creating any dump upon crash / writing sensitive data to disk

I'm writing a Java program that stores sensitive data (password and private keys) in memory. It will be deployed freely to any OS. I know that a user can create a memory dump manually on almost any system, but I am worried about a dump being created by the OS or JVM implementation (including, but not limited to some segfault of the JVM itself) that would compromise the privacy of the sensitive data.
Are there any steps that could be taken to reduce these risks? This question is POSIX specific but gives me an answer for these platforms. I had one non-platform specific idea that included setting an UncaughtExceptionHandler (like this) to a class that would overwrite sensitive data. But what about if memory is swapped out? What if the JVM crashes (e.g. segmentation fault) due to a JVM/JNI bug? I know Linux can stop data from being swapped to disk but is there a Java code to do this cross-platform? Mostly I'm worried about the potential for recovery of data on magnetic storage devices so any help is appreciated.

If you do not have control of the operating system, you basically cannot prohibit the user from accessing what you have in memory.
Hence, you need to keep the amount of sensitive data you hold onto to an absolute minimum. Just imagine that the knowledgeable user attaches a debugger, halts your program and snoops in your datastructures cherry-picking whatever they need to know.
So, when done using passwords, set all references to null. When done using keys, set their references to null. Note that this will not help for the determined knowledgeable user, but it will at least minimize the chance for accidental discovery.

If you are trying to stop casual users, then you don't really need to do anything. if you are trying to stop knowledgeable/determined users, and you are running the code on their computer, then there is nothing you can do.
if the only thing you are worried about is the program writing stuff to disk "accidentally", then, again, there isn't much you can do. java programs don't generate heap dumps unless you specifically tell them to. any stack traces which get output from an uncaught exception are highly unlikely to include anything sensitive in them. the file which gets written when a jvm segfaults in a controlled manner also will not likely have anything sensitive in it. the only potentially problematic thing would be a core dump on some variant of a unix system. and, unfortunately, i don't believe you can control that at the program level, only at the system/user configuration level (which was mentioned in the first question you linked to).

Reduce Memory Usage of Java Process

I wrote a wrapper application in c# NET that runs when the .jar file is running, closes when the .jar file closes, etc. This was basically to allow for our web panel to be able to query the executable to find out if it was actually running or not.
I have seen some other panels specifically intended for this software that have an option to reduce the memory usage of it when no one is connected. The java application (Minecraft) basically scales the RAM usage based on the size of the player world rather than how many players are connected. When no one is connected, it should be perfectly fine to reduce the usage.
So is there any way to reduce the RAM usage programatically from C# NET for a Java application?

AFAIK, there is no way to tell a JVM to give regular heap memory back to the operating system ... apart from telling it exit completely.

No.
Why not? Because you can't control the Java-Program in that way for two reasons:
You can't control what the JRE does with it's memory and how the GC is working.
If minecraft.jar requests 512MiB of RAM, he gets 512MiB of RAM. You can't just go all Hey, there's no one connected so I disallow you to allocate memory on an application. I mean, you could...but I don't think you want that (that would trigger exceptions and odd side-effects).
Edit: The only rather easy way to achieve this behavior would be to change the program. Since Minecraft is not free/open-source software, the only thing you could do is file a bug/feature request. Maybe even with extended information and a layout concept on how to achieve better memory usage.
I mean, I'm pretty sure that this could also be achieved with heavy usage of reflection via a Java program...but things go pretty fast downhill from there.

Solaris: virtual slices/disks for use with ZFS

This is a little related to my previous question Solaris: Mounting a file system on an application's handlers except this question is for a different purpose and is simpler as there is no open/close/lock it is just a fixed length block of bytes with read/write operations.
Is there anyway I can create a virtual slice, kinda like a RAM disk or a SVM slice.. but I want the reads and writes to go through my app.
I am planning to use ZFS to take multiple of these virtual slices/disks and make them into one larger one for distributed backup storage with snapshots. I really like the compression and stacking that ZFS offers. If necessary I can guarantee that there is only one instance of ZFS accessing these virtual disks at a time (to prevent cache conflicts and such). If the one instance goes down, we can make sure it won't start back up and then we can start another instance of that ZFS.
I am planning to have those disks in chunks of about 4GB or so,, then I can move around each chunk and decide where to store them (multiple times mirrored of course) and then have ZFS access the chunks and put them together in to larger chunks for actual use. Also ZFS would permit adding of these small chunks if necessary to increase the size of the larger chunk.
I am aware there would be extra latency / network traffic if we used my own app in Java, but this is just for backup storage. The production storage is entirely different configuration that does not relate.
Edit: We have a system that uses all the space available and basically when there is not enough space it will remove old snapshots and increase the gaps between old snapshots. The purpose of my proposal is to allow the unused space from production equipment to be put to use at no extra cost. At different times different units of our production equipment will have free space. Also the system I am describing should eliminate any single point of failure when attempting to access data. I am hoping to not have to buy two large units and keep them synchronized. I would prefer just to have two access points and then we can mix large/small units in any way we want and move data around seamlessly.
This is a cross post because this is more software related than sysadmin related The original question is here: https://serverfault.com/questions/212072. it may be a good idea for the original to be closed

One way would be to write a Solaris device driver, precisely a block device one emulating a real disk but that will communicate back to your application instead.
Start with reading the Device Driver Tutorial, then have a look at OpenSolaris source code for real drivers code.
Alternatively, you might investigate modifying Solaris iSCSI target to be the interface with your application. Again, looking at OpenSolaris COMSTAR will be a good start.

It seems that any fixed length file on any file system will do for a block device for use with ZFS. Not sure how reboots work, but I am sure we can get write some boot up commands to work that out.
Edit: The fixed length file would be on a network file system such as NFS.

How to determine why is Java app slow

We have an Java ERP type of application. Communication between server an client is via RMI. In peak hours there can be up to 250 users logged in and about 20 of them are working at the same time. This means that about 20 threads are live at any given time in peak hours.
The server can run for hours without any problems, but all of a sudden response times get higher and higher. Response times can be in minutes.
We are running on Windows 2008 R2 with Sun's JDK 1.6.0_16. We have been using perfmon and Process Explorer to see what is going on. The only thing that we find odd is that when server starts to work slow, the number of handles java.exe process has opened is around 3500. I'm not saying that this is the acual problem.
I'm just curious if there are some guidelines I should follow to be able to pinpoint the problem. What tools should I use? ....

Can you access to the log configuration of this application.
If you can, you should change the log level to "DEBUG". Tracing the DEBUG logs of a request could give you a usefull information about the contention point.
If you can't, profiler tools are can help you :
VisualVM (Free, and good product)
Eclipse TPTP (Free, but more complicated than VisualVM)
JProbe (not Free but very powerful. It is my favorite Java profiler, but it is expensive)
If the application has been developped with JMX control points, you can plug a JMX viewer to get informations...
If you want to stress the application to trigger the problem (if you want to verify whether it is a charge problem), you can use stress tools like JMeter

Sounds like the garbage collection cannot keep up and starts "halt-the-world" collecting for some reason.
Attach with jvisualvm in the JDK when starting and have a look at the collected data when the performance drops.

The problem you'r describing is quite typical but general as well. Causes can range from memory leaks, resource contention etcetera to bad GC policies and heap/PermGen-space allocation. To point out exact problems with your application, you need to profile it (I am aware of tools like Yourkit and JProfiler). If you profile your application wisely, only some application cycles would reveal the problems otherwise profiling isn't very easy itself.

In a similar situation, I have coded a simple profiling code myself. Basically I used a ThreadLocal that has a "StopWatch" (based on a LinkedHashMap) in it, and I then insert code like this into various points of the application: watch.time("OperationX");
then after the thread finishes a task, I'd call watch.logTime(); and the class would write a log that looks like this: [DEBUG] StopWatch time:Stuff=0, AnotherEvent=102, OperationX=150
After this I wrote a simple parser that generates CSV out from this log (per code path). The best thing you can do is to create a histogram (can be easily done using excel). Averages, medium and even mode can fool you.. I highly recommend to create a histogram.
Together with this histogram, you can create line graphs using average/medium/mode (which ever represents data best, you can determine this from the histogram).
This way, you can be 100% sure exactly what operation is taking time. If you can't determine the culprit, binary search is your friend (fine grain the events).
Might sound really primitive, but works. Also, if you make a library out of it, you can use it in any project. It's also cool because you can easily turn it on in production as well..

Aside from the GC that others have mentioned, Try taking thread dumps every 5-10 seconds for about 30 seconds during your slow down. There could be a case where DB calls, Web Service, or some other dependency becomes slow. If you take a look at the tread dumps you will be able to see threads which don't appear to move, and you could narrow your culprit that way.
From the GC stand point, do you monitor your CPU usage during these times? If the GC is running frequently you will see a jump in your overall CPU usage.
If only this was a Solaris box, prstat would be your friend.

For acute issues like this a quick jstack <pid> should quickly point out the problem area. Probably no need to get all fancy on it.
If I had to guess, I'd say Hotspot jumped in and tightly optimised some badly written code. Netbeans grinds to a halt where it uses a WeakHashMap with newly created objects to cache file data. When optimised, the entries can be removed from the map straight after being added. Obviously, if the cache is being relied upon, much file activity follows. You probably wont see the drive light up, because it'll all be cached by the OS.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.