MongoDB performance in SAFE mode - java

How does setting the WriteConcern flag to SAFE in the java driver affects MongoDB performances?

Slows it down significantly (as you would expect).
Without the "SAFE" flag, MongoDB drivers operate in "Fire & Forget" mode. So the update command is sent to the server and then the driver continues on. If there is an error with the write or the server dies before the change happens, the client knows nothing about it.
With the "SAFE" flag, the drivers do both the update command and the getLastError() command. That second command will not complete until the update actually happens on the DB. At the very least, you are sending two commands instead of one (so it's 50% slower).
In my experience it's actually 5x-20x slower. Of course, this makes sense because actually writing the data is the slow part of this whole piece.
Note that without the SAFE flag, certain exceptions will never happen. For example, you will never get a duplicate key exception.
If you plan to use MongoDB as a typical database (analogous to say MySQL), you need to use at least "SAFE" mode and replica sets. Otherwise, you need "JOURNAL" mode with a single box. I you use JOURNAL mode, you will notice that performance starts to look at lot like regular SQL.

Related

Dumping a Java program into a file and restarting it

I was just wondering if it's possible to dump a running Java program into a file, and later on restart it (same machine)
It's sounds a bit weird, but who knows
--- update -------
Yes, this is the hibernate feature for a process instead of a full system. But google 'hibernate jvm process' and you'll understand my pain.
There is a question for linux on this subject (here). Quickly, it's possible to hibernate a process (far from 100% reliable) with CryoPID.
A similar question was raised in stackoverflow some years ago.
With a JVM my educated guess is that hibernating should be a lot easier, not always possible and not reliable at 100% (e.g. UI and files).
Serializing a persistent state of the application is an option but it is not an answer to the question.
This may me a bit overkill but one thing you can do is run something like VirtualBox and halt/save the machine.
There is also:
- JavaFlow from Apache that should do just that even though I haven't personally tried
it.
- Brakes that may be exactly what you're looking for
There are a lot restrictions any solution to your problem will have: all external connections might or might not survive your attempt to freeze and awake them. Think of timeouts on the other side, or even stopped communication partners - anything from a web server to a database or even local files.
You are asking for a generic solution, without any internal knowledge of your program, that you would like to hibernate. What you can always do, is serialize that part of the state of your program, that you need to restart your program. It is, or at least was common wisdom to implement restart point in long running computations (think of days or weeks). So, when you hit a bug in your program after it run for a week, you could fix the bug and save some computation days.
The state of a program could be surprisingly small, compared to the complete memory size used.
You asked "if it's possible to dump a running Java program into a file, and later on restart it." - Yes it is, but I would not suggest a generic and automatic solution that has to handle your program as a black box, but I suggest that you externalize the important part of your programs state and program restart points.
Hope that helps - even if it's more complicated than what you might have hoped for.
I believe what the OP is asking is what the Smalltalk guys have been doing for decades - store the whole programming/execution environment in an image file, and work on it.
AFAIK there is no way to do the same thing in Java.
There has been some research in "persisting" the execution state of the JVM and then move it to another JVM and start it again. Saw something demonstrated once but don't remember which one. Don't think it has been standardized in the JVM specs though...
Found the presentation/demo I was thinking about, it was at OOPSLA 2005 that they were talking about squawk
Good luck!
Other links of interest:
Merpati
Aglets
M-JavaMPI
How about using SpringBatch framework?
As far as I understood from your question you need some reliable and resumable java task, if so, I believe that Spring Batch will do the magic, because you can split your task (job) to several steps while each step (and also the entire job) has its own execution context persisted to a storage you choose to work with.
In case of crash you can recover by analyzing previous run of specific job and resume it from exact point where the failure occurred.
You can also pause and restart your job programmatically if the job was configured as restartable and the ExecutionContext for this job already exists.
Good luck!
I believe :
1- the only generic way is to implement serialization.
2- a good way to restore a running system is OS virtualization
3- now you are asking something like single process serialization.
The problem are IOs.
Says your process uses a temporary file which gets deleted by the system after
'hybernation', but your program does not know it. You will have an IOException
somewhere.
So word is , if the program is not designed to be interrupted at random , it won't work.
Thats a risky and unmaintable solution so i believe only 1,2 make sense.
I guess IDE supports debugging in such a way. It is not impossible, though i don't know how. May be you will get details if you contact some eclipse or netbeans contributer.
First off you need to design your app to use the Memento pattern or any other pattern that allows you to save state of your application. Observer pattern may also be a possibility. Once your code is structured in a way that saving state is possible, you can use Java serialization to actually write out all the objects etc to a file rather than putting it in a DB.
Just by 2 cents.
What you want is impossible from the very nature of computer architecture.
Every Java program gets compiled into Java intermediate code and this code is then interpreted into into native platform code (when run). The native code is quite different from what you see in Java files, because it depends on underlining platform and JVM version. Every platform has different instruction set, memory management, driver system, etc... So imagine that you hibernated your program on Windows and then run it on Linux, Mac or any other device with JRE, such as mobile phone, car, card reader, etc... All hell would break loose.
You solution is to serialize every important object into files and then close the program gracefully. When "unhibernating", you deserialize these instances from these files and your program can continue. The number of "important" instances can be quite small, you only need to save the "business data", everything else can be reconstructed from these data. You can use Hibernate or any other ORM framework to automatize this serialization on top of a SQL database.
Probably Terracotta can this: http://www.terracotta.org
I am not sure but they are supporting server failures. If all servers stop, the process should saved to disk and wait I think.
Otherwise you should refactor your application to hold state explicitly. For example, if you implement something like runnable and make it Serializable, you will be able to save it.

Java RDBMS performance

If I have a method in a Java application for inserting data in to a RDBMS does the method move forward once it has passed the query to the database. i.e will the java application finish the method(connect, run query, close connection) before the RDBMS has processed the query. I want to run some tests but was not sure if the application will finish before the RDBMS and as a result give very little insight into how quickly the database has processed the data?
I guess I could finish each test, with drop table close connection, to ensure that the application has had to wait for the RDBMS to catch up.
Also will using the Eclipse IDE to test RDBMS over different opperating systems affect the performance drastically of the O.S it is running on. (Windows7 Solaris Ubuntu)
Thanks to anyone who makes an attempt to answer.
Using an IDE won't affect performance - it's just a regular JVM that is started.
As for the other question - unless you are using some asynchronous driver (I don't know if such exist), or starting new threads for each operation, they are synchronous - i.e. the program waits for the database to return result (or to timeout, which should be configurable)
Generally, all such calls are synchronous; they block until the operation completes. Unless you are doing something very different.
You may want to look into connection pooling so that you can reuse the connections to avoid creation/destruction costs which can be substantial.

JDK 6: Is there a way to run a new java process that executes the main method of a specified class

I'm trying to develop an application that just before quit has to run a new daemon process to execute the main method of a class.
I require that after the main application quits the daemon process must still be in execution.
It is a Java Stored Procedure running on Oracle DB so I can't use Runtime.exec because I can't locate the java class from the Operating System Shell because it's defined in database structures instead of file system files.
In particular the desired behavior should be that during a remote database session I should be able to
call the first java method that runs the daemon process and quits leaving the daemon process in execution state
and then (having the daemon process up and the session control, because the last call terminated) consequentially
call a method that communicates with the daemon process (that finally quits at the end of the communication)
Is this possible?
Thanks
Update
My exact need is to create and load (reaching the best performances) a big text file into the database supposing that the host doesn't have file transfer services from a Java JDK6 client application connecting to Oracle 11gR1 DB using JDBC-11G oci driver.
I already developed a working solution by calling a procedure that stores into a file the LOB(large database object) given as input, but such a method uses too many intermediate structures that I want to avoid.
So I thought about creating a ServerSocket on the DB with a first call and later connect to it and establish the data transfer with a direct and fast communication.
The problem I encountered comes out because the java procedure that creates the ServerSocket can't quit and leave an executing Thread/Process listening on that Socket and the client, to be sure that the ServerSocket has been created, can't run a separate Thread to handle the rest of the job.
Hope to be clear
I'd be surprised if this was possible. In effect you'd be able to saturate the DB Server machine with an indefinite number of daemon processes.
If such a thing is possible the technique is likely to be Oracle-specific.
Perhaps you could achieve your desired effect using database triggers, or other such event driven Database capabilities.
I'd recommend explaining the exact problem you are trying to solve, why do you need a daemon? My instict is that trying to manage your daemon's life is going to get horribly complex. You may well need to deal with problems such as preventing two instances being launched, unexpected termination of the daemon, taking daemon down when maintenance is needed. This sort of stuff can get really messy.
If, for example, you want to run some Java code every hour then almost certanly there are simpler ways to achieve that effect. Operating systems and databases tend to have nice methods for initiating work at desired times. So having a stored procedure called when you need it is probably a capability already present in your environment. Hence all you need to do is put your desired code in the stored procedure. No need for you to hand craft the shceduling, initiation and management. One quite significant advantage of this approach is that you end up using a tehcnique that other folks in your environment already understand.
Writing the kind of code you're considering is very intersting and great fun, but in commercial environments is often a waste of effort.
Make another jar for your other Main class and within your main application call the jar using the Runtime.getRuntime().exec() method which should run an external program (another JVM) running your other Main class.
The way you start subprocesses in Java is Runtime.exec() (or its more convenient wrapper, ProcessBuilder). If that doesn't work, you're SOL unless you can use native code to implement equivalent functionality (ask another question here to learn how to start subprocesses at the C++ level) but that would be at least as error-prone as using the standard methods.
I'd be startled if an application server like Oracle allowed you access to either the functionality of starting subprocesses or of loading native code; both can cause tremendous mischief so untrusted code is barred from them. Looking over your edit, your best approach is going to be to rethink how you tackle your real problem, e.g., by using NIO to manage the sockets in a more efficient fashion (and try to not create extra files on disk; you'll just have to put in extra elaborate code to clean them up…)

MySQL performance

I have this LAMP application with about 900k rows in MySQL and I am having some performance issues.
Background - Apart from the LAMP stack , there's also a Java process (multi-threaded) that runs in its own JVM. So together with LAMP & java, they form the complete solution. The java process is responsible for inserts/updates and few selects as well. These inserts/updates are usually in bulk/batch, anywhere between 5-150 rows. The PHP front-end code only does SELECT's.
Issue - the PHP/SELECT queries become very slow when the java process is running. When the java process is stopped, SELECT's perform alright. I mean the performance difference is huge. When the java process is running, any action performed on the php front-end results in 80% and more CPU usage for mysqld process.
Any help would be appreciated.
MySQL is running with default parameters & settings.
Software stack -
Apache - 2.2.x
MySQL -5.1.37-1ubuntu5
PHP - 5.2.10
Java - 1.6.0_15
OS - Ubuntu 9.10 (karmic)
What engine are you using for MySQL? The thing to note here is if you're using MyISAM, then you're going to have locking issues due to the table locking that engine uses.
From: MySQL Table Locking
Table locking is also disadvantageous
under the following scenario:
* A session issues a SELECT that takes a long time to run.
* Another session then issues an UPDATE on the same table. This session
waits until the SELECT is finished.
* Another session issues another SELECT statement on the same table.
Because UPDATE has higher priority than SELECT, this SELECT waits for the UPDATE to finish,
after waiting for the first SELECT to finish.
I won't repeat them here, but the page has some tips on increasing concurrency on a table within MySQL. Obviously, one option would be to change to an engine like InnoDB which has a more complex row locking mechanism that for high concurrency tables can make a huge difference in performance. For more info on InnoDB go here.
Prior to changing the engine though it would probably be worth looking at the other tips like making sure your table is indexed properly, etc. as this will increase select and update performance regardless of the storage engine.
Edit based on user comment:
I would say it's one possible solution based on the symptoms you've described, but it may not be
the one that will get you where you want to be. It's impossible to say without more information.
You could be doing full table scans due to the lack of indexes. This could be causing I/O contention
on your disk, which just further exasterbates the table locks used by MyISAM. If this is the case then
the root of the cause is the improper indexing and rectifying that would be your best course of action
before changing storage engines.
Also, make sure your tables are normalized. This can have profound implications on performance
especially on updates. Normalized tables can allow you to update a single row instead of hundreds or
thousands in an un-normalized table. This is due to unduplicated values. It can also save huge amounts
of I/O on selects as the db can more efficiently cache data blocks. Without knowing the structure of
the tables you're working with or the indexes you have present it's difficult to provide you with a
more detailed response.
Edit after user attempted using InnoDB:
You mentioned that your Java process is multi-threaded. Have you tried running the process with a single thread? I'm wondering if maybe it's possibly you're sending the same rows to update out to multiple threads and/or the way you're updating across threads is causing locking issues.
Outside of that, I would check the following:
Have you checked your explain plans to verify you have reasonable costs and that the query is actually using the indexes you have?
Are your tables normalized? More specifically, are you updating 100 rows when you could update a single record if the tables were normalized?
Is it possible that you're running out of physical memory when the Java process is running and the machine is busy swapping stuff in and out?
Are you flooding your disk (a single disk?) with more IOPs than it can reasonably handle?
We'd need to know a lot more about the system to say if thats normal or how to solve the problem.
with about 900k rows in MySQL
I would say that makes it very small - so if its performing badly then you're going seriously wrong somewhere.
Enable the query log to see exactly what queries are running, prioritize based on the product of frequency and duration. Have a look at the explain plans, create some indexes. Think about splitting the database across multiple disks.
HTH
C.

Are there any Java VMs which can save their state to a file and then reload that state?

Are there any Java VMs which can save their state to a file and then reload that state?
If so, which ones?
Another option, which may or may not be relevant in your case, is to run the JVM (any JVM) inside a virtual machine. Most virtual machines offer the option to store and resume state, so you should be able to restart your PC, fire up the VM when it comes back up and have the Java process pick up from where it was.
I use VMWare Player for testing on IE at work, and this works as noted above when I close and later reopen it. I don't generally do this when apps are doing anything of note in the VM, but as long as they aren't accessing any external resources (e.g. network sockets), I would expect it to work as if the VM was never shut down.
Continuations are probably be what you are looking for:
[...] first class continuations, which are constructs
that give a programming language the
ability to save the execution state at
any point and return to that point at
a later point in the program.
There are at least two continuation libraries for Java: RIFE continuations and javaflow. I know that javaflow at least allows serializing state to disk:
A Continuation can be serialized if
all objects it captured is also
serializable. In other words, all the
local variables (including all this
objects) need to be marked as
Serializable. In this example, you
need to mark the MyRunnable class as
Serializable . A serialized
continuation can be sent over to
another machine or used later. - Javaflow Tutorial
You should serialize relevant domain-specific objects which can be de-serialized by another JVM run-time.
I'm not aware of any tools persisting an entire JVM. The closest I got to doing this was creating a core dump from a running JVM process using gcore, then using jsadebugd, jmap or jstack to debug it.
For instance:
$ jps # get JVM process ID XXX
$ gcore -o core XXX
$ jsadebugd $JAVA_HOME/bin/java core.XXX
UPDATE
I don't think you're going to find a solution that's portable between architectures just yet.
It is worth noting that many objects cannot be serialized as they have state outside the java context.
e.g. Sockets, Threads, Open files, Database connections.
For this reason, it is difficult to to save the state of a useful application in a generic way.
I'm not aware of JVM's that can store state. Depending on your exact needs, you can maybe consider using Terracotta. Terracotta is essentially able to share heap state between JVM's, and store this state to disk.
This can be used to cluster applications, and/or make the heapstate persistent. In effect, you can use it to start the JVM up and pick up where you left off. For more information check out:
http://www.infoq.com/articles/open-terracotta-intro
Hope this helps.
I've worked on an embedded Java project which used this approach to start up quickly.
The JVM was from Wind River, running on top of VxWorks.
Sun has done some research on "orthogonal persistence", which provides "persistence for the full computational model that is definedby the Java Language Specification":
http://research.sun.com/forest/COM.Sun.Labs.Forest.doc.opjspec.abs.html
PJama is a prototype implementation:
http://research.sun.com/forest/opj.main.html
To my knowledge, there is nothing to capture JVM state and restore it, but people are trying to serialize/deserialize the Thread class to achieve something similar. The closest thing to a working implementation I found was brakes, but you may find more when you google for "thread serialization".
I take it you want to be able to resume from where the snapshot was stored, as if nothing thereafter had happened.
I wonder how many framework components and libraries such functionality would break. Suddenly, you are reviving a JVM state from storage; in the meantime, the clock has mysteriously skipped forward by 23 hours, network connections are no longer valid, GUI objects no longer have any underlying O/S handles... I'd say this is nontrivial, and impossible in the general case without modifying the framework extensively.
If you can get away with just storing the state of your in-memory objects, then something like Prevaylor might work for you. It uses a combination of journalling changes to business objects with a serialized snapshot to record the state of your objects, which you can then reload later.
However, it doesn't store the full JVM state (call stack, GC status etc). If you really need that level of detail, then a specialized JVM might be needed.
The answer at this time is no, there are no JVMs that can 'hibernate' like your operating system can or like VMWare et al can.
You could get half-way there, depending on the complexity of your app, by just serializing state out when the program closes and serializing it back in, but that won't do stuff like pause executing some business logic when you close and then continue when you open it again.

Categories