If I have a method in a Java application for inserting data in to a RDBMS does the method move forward once it has passed the query to the database. i.e will the java application finish the method(connect, run query, close connection) before the RDBMS has processed the query. I want to run some tests but was not sure if the application will finish before the RDBMS and as a result give very little insight into how quickly the database has processed the data?
I guess I could finish each test, with drop table close connection, to ensure that the application has had to wait for the RDBMS to catch up.
Also will using the Eclipse IDE to test RDBMS over different opperating systems affect the performance drastically of the O.S it is running on. (Windows7 Solaris Ubuntu)
Thanks to anyone who makes an attempt to answer.
Using an IDE won't affect performance - it's just a regular JVM that is started.
As for the other question - unless you are using some asynchronous driver (I don't know if such exist), or starting new threads for each operation, they are synchronous - i.e. the program waits for the database to return result (or to timeout, which should be configurable)
Generally, all such calls are synchronous; they block until the operation completes. Unless you are doing something very different.
You may want to look into connection pooling so that you can reuse the connections to avoid creation/destruction costs which can be substantial.
Related
I have multithreaded java application with my ~5 threads (and also many threads from jetty web server), some of them are reading/writing mongodb from time to time. Some of writes are intensive, where I read 200K mongodb objects, but they don't happen continiously, they happen once in few minutes. For few hours application works perfectly, but later I see this situation:
Mongo is not doing any work, as far I understand it:
Here is my jstack output:
https://gist.github.com/stiv-yakovenko/06b0d235fd2c32d839788edf56aaa6cd
You can see that all threads are waiting for one thread, which, in turn is waiting for mongo, while mongo is doing nothing. Before problem begings, healthy situation is that no threads are waiting for anyone else, because load is not that high to block everything. Before mongo I was using mapdb to store same data and I never had issues like that.
I've seen same situation with multiple threads waiting for mongo, so I decided to put all mongodb invocations under the same ReentrantLock(true). I hoped that rootcause was too many threads wanted to access mongo, but it doesn't help. I don't know what to do, tried to reproduce the problem with simple code, but I can't. Any ideas?
UPD: here is jstat output as one of commenters requested:
Well, finally it turned out that it was a garbage collection. I've ended up using G1 garbage collector. But it was not enough, because it couldn't deliver required latency (though it was close to it). I had to split application into two parts, one for doing intensive garbage-producing calculations, another for low-latency web responses.
I'm writing automated scripts to load thousands of records into the web application and the time frame in which data has to be loaded is very less. So I thought of using Selenium Grid to run the scripts in parallel to achieve lesser time. Now, My question is will this affect the execution time of the automated scripts or the hub machine. There will be around 20 machines or maybe even more connected to the hub.
Also, Is using selenium grid the best option for this or I could use some other approach as well. And, feeding data from database or using web services is not possible.
Thanks in advance.
will this affect the execution time of the automated scripts or the hub machine
Maybe. It depends on the rest of your stack. Is the underlying server multi-threaded; will each Selenium Grid instance be provisioned its own process on the server? What about your database - will transaction locking block the other processes? Will any of your servers hit some performance bottleneck? Is the database ACID compliant? - I.e. Will running multiple Selenium instances like this cause race condition errors?
There's only one way to know for sure: Do a trial run, and benchmark it.
Is using selenium grid the best option for this or I could use some other approach as well
If you have direct access to the database, then you could seed the data directly (using SQL, presumably). This would be much faster, if it's a viable option.
Also, it depends what you're typing to achieve here. Are you simply seeding the application with a tonne of data? (In which case, SQL is a better option.) Or, are you actually performance-testing the website? (In which case, intense parallel execution may be part of the specs.)
I developed a plugin of my own in Neo4j in order to speed the process of inserting node. Mainly because I needed to insert node and relationship only if they didn't exists before which can be too slow using the REST API.
If I try to call my plugin a 100 time, inserting roughly 100 nodes and 100 relationship each time, it take approximately 350ms on each call. Each call is inserting different nodes, in order to rule out locking cause.
However if I parallelize my calls (2, 3 , 4... at time), the response time drop accordingly to the parallelism degree. It takes 750ms to insert my 200 objects when I do 2 call at a time, 1000ms when I do 3 etc.
I'm calling my plugin from a .NET MVC controller, using HttpWebRequest. I set the maxConnection to 10000, and I can see all the TCP connection opened.
I investigated a little on this issue but it seems very wrong. I must have done something wrong, either in my neo4j configuration, or in my plugin code. Using VisualVM I found out that the threads launched by Neo4j to handle my calls are working sequentially. See the picture linked.
http://i.imgur.com/vPWofTh.png
My conf :
Windows 8, 2 core
8G of RAM
Neo4j 2.0M03 installed as a service with no conf tuning
Hope someone will be able to help me. As it is, I will be unable to use Neo4j in production, where there will be tens of concurrent calls, which cannot be done sequentially.
Neo4j is transactional. Every commit triggers an IO operation on filesystem which needs to run in a synchronized block - this explains the picture you've attached. Therefore it's best practice to run writes single threaded. Any pre-processing prior can of course benefit from parallelizing.
In general for maximum performance go with the stable version (1.9.2 as of today). Early milestone builds are not optimized yet, so you might get a wrong picture.
Another thing to consider is the transaction size used in your plugin. 10k to 50k in a single transaction should give you best results. If your transactions are very small, transactional overhead is significant, in case of huge transactions, you need lots of memory.
Write performance is heavily driven by the performance of underlying IO subsystem. If possible use fast SSD drives, even better stripe then.
How does setting the WriteConcern flag to SAFE in the java driver affects MongoDB performances?
Slows it down significantly (as you would expect).
Without the "SAFE" flag, MongoDB drivers operate in "Fire & Forget" mode. So the update command is sent to the server and then the driver continues on. If there is an error with the write or the server dies before the change happens, the client knows nothing about it.
With the "SAFE" flag, the drivers do both the update command and the getLastError() command. That second command will not complete until the update actually happens on the DB. At the very least, you are sending two commands instead of one (so it's 50% slower).
In my experience it's actually 5x-20x slower. Of course, this makes sense because actually writing the data is the slow part of this whole piece.
Note that without the SAFE flag, certain exceptions will never happen. For example, you will never get a duplicate key exception.
If you plan to use MongoDB as a typical database (analogous to say MySQL), you need to use at least "SAFE" mode and replica sets. Otherwise, you need "JOURNAL" mode with a single box. I you use JOURNAL mode, you will notice that performance starts to look at lot like regular SQL.
doing profiling on an java application running websphere 7 and DB2 we can see that we spend most of our time in the com.ibm.ws.rsadapter.jdbc package handling connections to and from the database.
How can we tune our jdbc performance?
What other strategies exist when database performance is a bottleneck?
Thanks
You should check your websphere manual for how you configure a connection pool.
Update 2021
Here is an introduction inculding code samples
Update 2021
One cause of slow connect times is a deactivated database, which does not open its files and allocate its memory buffers and heaps until the first application attempts to connect to it. Ask your DBA to confirm that the database is active before running your tests. The LIST ACTIVE DATABASES command (run from the local DB2 server or over a remote attachment) should show your database in its output. If the database is not activated, have your DBA activate it explicitly with ACTIVATE DATABASE yourDBname. That will ensure that the database files and memory structures remain available even when the last user disconnects from the database.
Use GET MONITOR SWITCHES to ensure all your monitor switches are enabled for your database, otherwise you'll miss out on some potentially revealing performance details. The additional overhead of tracking the data associated with those monitor switches is minimal, while the value of the performance data is significant.
If the database is always active and things still seem slow, there are detailed DB2 traces called event monitors that log everything they encounter to a file, pipe, or DB2 table. The statement event monitor is one I turn to fairly often to analyze SQL statement efficiency and UOW hygiene. I also prefer taking the extra hit to log the event monitor records to a table rather than a file, so I can use SQL to search the data for all sorts of patterns. The db2evtbl utility makes it fairly easy to define the event monitor you want and create the tables to store its output. The SET EVENT MONITOR STATE command is how you start and stop the event monitor you've created.
In my experience what you are seeing is pretty common. The question to ask is what exactly is the DB2 connection doing...
The first thing to do is to try and isolate the performance issue down to a section of the website - i.e. is there one part of the application that see poor performance, when you find that you can increase the trace logging to see if you can see the query causing issues.
Additionally, if you chat to your DBA's they may be able to run some analysis on the database to tell you what queries are taking the time to return values, this may also help in your troubleshooting.
Good luck!
Connection pooling
Caching
DBAs