I'm writing automated scripts to load thousands of records into the web application and the time frame in which data has to be loaded is very less. So I thought of using Selenium Grid to run the scripts in parallel to achieve lesser time. Now, My question is will this affect the execution time of the automated scripts or the hub machine. There will be around 20 machines or maybe even more connected to the hub.
Also, Is using selenium grid the best option for this or I could use some other approach as well. And, feeding data from database or using web services is not possible.
Thanks in advance.
will this affect the execution time of the automated scripts or the hub machine
Maybe. It depends on the rest of your stack. Is the underlying server multi-threaded; will each Selenium Grid instance be provisioned its own process on the server? What about your database - will transaction locking block the other processes? Will any of your servers hit some performance bottleneck? Is the database ACID compliant? - I.e. Will running multiple Selenium instances like this cause race condition errors?
There's only one way to know for sure: Do a trial run, and benchmark it.
Is using selenium grid the best option for this or I could use some other approach as well
If you have direct access to the database, then you could seed the data directly (using SQL, presumably). This would be much faster, if it's a viable option.
Also, it depends what you're typing to achieve here. Are you simply seeding the application with a tonne of data? (In which case, SQL is a better option.) Or, are you actually performance-testing the website? (In which case, intense parallel execution may be part of the specs.)
Related
In my project, I am using JAX-RS(websphere), EJB and OpenJPA as major technologies.
Even in a very simple scenario of simple GET calls, service takes quite long. Major tasks I can see involved are DB call using JPA,converting entity object to a transfer object using Dozer mapper, and finally underlying websphere implementation converting transfer object to JSON. Data is just few rows in table with no eager loading(thus no data from table other than target table).
I don't think huge computation involved here. Still it is taking around 10-12 seconds. Also server is powerful enough.
What should be my approach to find out the root cause? I plan to find out time consumed by each major component involved using System.nanoTime(). But are there any other better approaches?
This is what we had done when we had such issues earlier.
We analyzed and found the major bottlenecks, most of the times few incorrect parameters/ method choices result in such bottlenecks.
So try using alternatives for the methods that are consuming most of your time.
You don't need to use system.nanotime. You can use any profiler tool like virtualVM, that would give you a detailed analysis of the time taken every single unit.
This answer may not answer your question directly. But you can use jvisualVM to find the root cause. Below blog will help you to configure it on your eclipse quickly
https://blog.idrsolutions.com/2013/05/setting-up-visualvm-in-under-5-minutes/
jvisualVM is the tool comes free with jdk , you can spot it in the bin folder of the jdk with the name "jvisualvm.exe" you can just double click on it to start
Once you configured run your program then let VisualVM capture your processing. then you could interpret which class spend more CPU or time consumption by clicking "Sampler -> CPU " button, That will give you which method takes more time and CPU.Below blog will also help you to analyze the data.
http://scn.sap.com/community/java/blog/2015/09/18/step-by-step-to-use-visualvm-to-do-performance-measurement
If your program spends most of the time in networking related packages, then either your program is waiting for the response from the server or your program is calling the server more often in side a loop (In this case you have to find a way to aggregate the data request so that call to the external server will be reduced)
Please keep us posted about your findings!
Sorry if the question is too open-ended or otherwise not suitable, but this is due to my lack of understanding about several pieces of technology/software, and I'm quite lost. I have a project where I have an existing java swing GUI, which runs MPI jobs on a local machine. However, it is desired to support running MPI jobs on HPC clusters (let's assume linux cluster with ssh access). To be more specific, the main backend executable (linux and windows) that I need to, erm, execute uses a very simple master-slave system where all relevant output is performed by the master node only. Currently, to run my backend executable on multiple machines, I would simply need to copy all necessary files to the machines (assuming no shared filespace) and call "mpiexec" or "mpirun" as is usual practice. The output produced by the master needs to be read in (or partially read in) by my GUI.
The main problem as I see things is this: Where to run the GUI? Several options:
Local machine - potential problem is needing to read data from cluster back to local machine (and also reading stdout/stderr of the cluster processes) to display current progress to user.
Login node - obvious problem of hogging precious resources, and in many cases will be banned.
Compute node - sounds pretty dodgy - especially if the cluster has a queuing system (slurm, sun grid, etc)! Also possibly banned.
Of these three options, the first seems the most reasonable, and also seems least likely to upset any HPC admin people, but is also the hardest to implement! There are multiple problems associated with that setup:
Passing data from cluster to local machine - because we're using a cluster - by definition we probably will generate large amounts of data, which the user wants to see at least part of! Also, how should this be done? I can see how to execute commands on remote machine via ssh using jsch or similar, but if i'm currently logged in on the remote machine - how do I communicate information back to the local machine?
Displaying stdout/stderr of backend in local machine. Similar to above.
Dealing with peculiar aspects of individual clusters - the only way I see around that is to allow the user to write custom slurm scripts or such like.
How to detect if backend computations have finished/failed - this problem interacts with any custom slurm scripts written by user.
Hopefully it should be clear from the above that I'm quite confused. I've had a look at apache camel, jsch, ganemede ssh, apache mina, netty, slurm, Sun Grid, open mpi, mpich, pmi, but there's so much information that I think I need to ask for some help and advice. I would greatly appreciate any comments regarding these problems!
Thanks
================================
Edit
Actually, I just came across this: link which seems to suggest that if the cluster allows an "interactive"-mode job, then you can run a GUI from a compute node. However, I don't know much about this, nor do I know if this is common. I would be grateful for comments on this aspect.
You may be able to leverage the approach shown here: a ProcessBuilder is used to execute a command in the background of a SwingWorker, while the command's output is displayed in a suitable component. In the example, ls -l would become ssh username#host 'ls -l'. Use JPasswordField as required.
I developed a plugin of my own in Neo4j in order to speed the process of inserting node. Mainly because I needed to insert node and relationship only if they didn't exists before which can be too slow using the REST API.
If I try to call my plugin a 100 time, inserting roughly 100 nodes and 100 relationship each time, it take approximately 350ms on each call. Each call is inserting different nodes, in order to rule out locking cause.
However if I parallelize my calls (2, 3 , 4... at time), the response time drop accordingly to the parallelism degree. It takes 750ms to insert my 200 objects when I do 2 call at a time, 1000ms when I do 3 etc.
I'm calling my plugin from a .NET MVC controller, using HttpWebRequest. I set the maxConnection to 10000, and I can see all the TCP connection opened.
I investigated a little on this issue but it seems very wrong. I must have done something wrong, either in my neo4j configuration, or in my plugin code. Using VisualVM I found out that the threads launched by Neo4j to handle my calls are working sequentially. See the picture linked.
http://i.imgur.com/vPWofTh.png
My conf :
Windows 8, 2 core
8G of RAM
Neo4j 2.0M03 installed as a service with no conf tuning
Hope someone will be able to help me. As it is, I will be unable to use Neo4j in production, where there will be tens of concurrent calls, which cannot be done sequentially.
Neo4j is transactional. Every commit triggers an IO operation on filesystem which needs to run in a synchronized block - this explains the picture you've attached. Therefore it's best practice to run writes single threaded. Any pre-processing prior can of course benefit from parallelizing.
In general for maximum performance go with the stable version (1.9.2 as of today). Early milestone builds are not optimized yet, so you might get a wrong picture.
Another thing to consider is the transaction size used in your plugin. 10k to 50k in a single transaction should give you best results. If your transactions are very small, transactional overhead is significant, in case of huge transactions, you need lots of memory.
Write performance is heavily driven by the performance of underlying IO subsystem. If possible use fast SSD drives, even better stripe then.
If I have a method in a Java application for inserting data in to a RDBMS does the method move forward once it has passed the query to the database. i.e will the java application finish the method(connect, run query, close connection) before the RDBMS has processed the query. I want to run some tests but was not sure if the application will finish before the RDBMS and as a result give very little insight into how quickly the database has processed the data?
I guess I could finish each test, with drop table close connection, to ensure that the application has had to wait for the RDBMS to catch up.
Also will using the Eclipse IDE to test RDBMS over different opperating systems affect the performance drastically of the O.S it is running on. (Windows7 Solaris Ubuntu)
Thanks to anyone who makes an attempt to answer.
Using an IDE won't affect performance - it's just a regular JVM that is started.
As for the other question - unless you are using some asynchronous driver (I don't know if such exist), or starting new threads for each operation, they are synchronous - i.e. the program waits for the database to return result (or to timeout, which should be configurable)
Generally, all such calls are synchronous; they block until the operation completes. Unless you are doing something very different.
You may want to look into connection pooling so that you can reuse the connections to avoid creation/destruction costs which can be substantial.
I'm trying to develop an application that just before quit has to run a new daemon process to execute the main method of a class.
I require that after the main application quits the daemon process must still be in execution.
It is a Java Stored Procedure running on Oracle DB so I can't use Runtime.exec because I can't locate the java class from the Operating System Shell because it's defined in database structures instead of file system files.
In particular the desired behavior should be that during a remote database session I should be able to
call the first java method that runs the daemon process and quits leaving the daemon process in execution state
and then (having the daemon process up and the session control, because the last call terminated) consequentially
call a method that communicates with the daemon process (that finally quits at the end of the communication)
Is this possible?
Thanks
Update
My exact need is to create and load (reaching the best performances) a big text file into the database supposing that the host doesn't have file transfer services from a Java JDK6 client application connecting to Oracle 11gR1 DB using JDBC-11G oci driver.
I already developed a working solution by calling a procedure that stores into a file the LOB(large database object) given as input, but such a method uses too many intermediate structures that I want to avoid.
So I thought about creating a ServerSocket on the DB with a first call and later connect to it and establish the data transfer with a direct and fast communication.
The problem I encountered comes out because the java procedure that creates the ServerSocket can't quit and leave an executing Thread/Process listening on that Socket and the client, to be sure that the ServerSocket has been created, can't run a separate Thread to handle the rest of the job.
Hope to be clear
I'd be surprised if this was possible. In effect you'd be able to saturate the DB Server machine with an indefinite number of daemon processes.
If such a thing is possible the technique is likely to be Oracle-specific.
Perhaps you could achieve your desired effect using database triggers, or other such event driven Database capabilities.
I'd recommend explaining the exact problem you are trying to solve, why do you need a daemon? My instict is that trying to manage your daemon's life is going to get horribly complex. You may well need to deal with problems such as preventing two instances being launched, unexpected termination of the daemon, taking daemon down when maintenance is needed. This sort of stuff can get really messy.
If, for example, you want to run some Java code every hour then almost certanly there are simpler ways to achieve that effect. Operating systems and databases tend to have nice methods for initiating work at desired times. So having a stored procedure called when you need it is probably a capability already present in your environment. Hence all you need to do is put your desired code in the stored procedure. No need for you to hand craft the shceduling, initiation and management. One quite significant advantage of this approach is that you end up using a tehcnique that other folks in your environment already understand.
Writing the kind of code you're considering is very intersting and great fun, but in commercial environments is often a waste of effort.
Make another jar for your other Main class and within your main application call the jar using the Runtime.getRuntime().exec() method which should run an external program (another JVM) running your other Main class.
The way you start subprocesses in Java is Runtime.exec() (or its more convenient wrapper, ProcessBuilder). If that doesn't work, you're SOL unless you can use native code to implement equivalent functionality (ask another question here to learn how to start subprocesses at the C++ level) but that would be at least as error-prone as using the standard methods.
I'd be startled if an application server like Oracle allowed you access to either the functionality of starting subprocesses or of loading native code; both can cause tremendous mischief so untrusted code is barred from them. Looking over your edit, your best approach is going to be to rethink how you tackle your real problem, e.g., by using NIO to manage the sockets in a more efficient fashion (and try to not create extra files on disk; you'll just have to put in extra elaborate code to clean them up…)