Progammatic browsing performance issues - java

I'm using Selenium (in Java) to do programmatic browsing (Firefox) and, to speed up page loading, I'm going through a programmatic Java proxy running in the same application (Browsermob) to prevent the loading of external content (ads, etc) as I only need website's hosted content.
The application is browsing through hundreds of thousands of pages but unfortunately, after a few hours, speed is reducing significantly (from ~5s/page to ~30s/page).
Can it be related to browser cache size? Proxy cache? How to check?
Any pointer or hint would be more than welcome!
Many thanks,
Tom

Hi I would change your User Agent to something like Chrome to check wether or not Firefox is the problem or if its down to memory management in your app.
Are you closing down the Browser after a number of views? remember the browser will cache and start to take up a lot of memory. It may be better for you to use a text based browser (Lynx) in your app that removes a lot of overheads if your are simply going for text content.
Id kill the process every so often as memory leaks will cause the process to slow over time.

Related

How can you force a heap dump on a java-based web application that goes slowly in order to figure out where that slowness occurs?

We are currently using a modified version of Atlassian Confluence v.3.5
and have created a space containing large number of pages (about 5000) and large number of attachments (about 10000).
When navigating to the home page of this big space it takes about 3 minutes to load completely (the safari web browser shows a spinning wheel indicating page resources are still being loaded).
In these 3 minutes, we are unable to determine where the processing time is being spent.
We turned on confluence's profiling feature but it did not help because there was not much output in the log file.
The confluence process (which is a java process) is using about 8.2% CPU during the 3 minutes. How can I figure out what the process is doing?
You have all these options:
HeapDumpOnCtrlBreak
HeapDumpOnOutOfMemoryError
Jmap
HotSpotDiagnosticMXBean
A Thread Dump may also be useful. You can use it to figure out what the threads are waiting on.
You can also use a profiler. The best one I've used is JProfiler. But there's other ones available that are free and open source. I think netbeans comes with one. And sun makes one called VisualVM.

How can I get crawler4j to download all links from a page more quickly?

What I do is:
- crawl the page
- fetch all links of the page, puts them in a list
- start a new crawler, which visits each links of the list
- download them
There must be a quicker way, where I can download the links directly when I visit the page? Thx!
crawler4j automatically does this process for you. You first add one or more seed pages. These are the pages that are first fetched and processed. crawler4j then extracts all the links in these pages and passes them to your shouldVisit function. If you really want to crawl all of them this function should just return true on all functions. If you only want to crawl pages within a specific domain you can check the URL and return true or false based on that.
Those URLs that your shouldVisit returns true, are then fetched by crawler threads and the same process is performed on them.
The example code here is a good sample for starting.
The general approach would be to separate the crawling, and the downloading tasks into separate worker Threads, with a maximum number of Threads, depending on your memory requirements (i.e. maximum RAM you want to use for storing all this info).
However, crawler4j already gives you this functionality. By splitting downloading and crawling into separate Threads, you try to maximize the utilization of your connection, pulling down as much data as both your connection can handle, and as the servers providing the information can send you. The natural limitation to this is that, even if you spawn 1,000 Threads, if the servers are only given you the content at 0.3k per second, that still only 300 KB per second that you'll be downloading. But you just don't have any control over that aspect of it, I'm afraid.
The other way to increase the speed is to run the crawler on a system with a fatter pipe to the internet, since your maximum download speed is, I'm guessing, the limiting factor to how fast you can get data currently. For example, if you were running the crawling on an AWS instance (or any of the cloud application platforms), you would benefit from their extremely high speed connections to backbones, and shorten the amount of time it takes to crawl a collection of websites by effectively expanding your bandwidth far beyond what you're going to get at a home or office connection (unless you work at an ISP, that is).
It's theoretically possible that, in a situation where your pipe is extremely large, the limitation starts to become the maximum write speed of your disk, for any data that you're saving to local (or network) disk storage.

How to determine why is Java app slow

We have an Java ERP type of application. Communication between server an client is via RMI. In peak hours there can be up to 250 users logged in and about 20 of them are working at the same time. This means that about 20 threads are live at any given time in peak hours.
The server can run for hours without any problems, but all of a sudden response times get higher and higher. Response times can be in minutes.
We are running on Windows 2008 R2 with Sun's JDK 1.6.0_16. We have been using perfmon and Process Explorer to see what is going on. The only thing that we find odd is that when server starts to work slow, the number of handles java.exe process has opened is around 3500. I'm not saying that this is the acual problem.
I'm just curious if there are some guidelines I should follow to be able to pinpoint the problem. What tools should I use? ....
Can you access to the log configuration of this application.
If you can, you should change the log level to "DEBUG". Tracing the DEBUG logs of a request could give you a usefull information about the contention point.
If you can't, profiler tools are can help you :
VisualVM (Free, and good product)
Eclipse TPTP (Free, but more complicated than VisualVM)
JProbe (not Free but very powerful. It is my favorite Java profiler, but it is expensive)
If the application has been developped with JMX control points, you can plug a JMX viewer to get informations...
If you want to stress the application to trigger the problem (if you want to verify whether it is a charge problem), you can use stress tools like JMeter
Sounds like the garbage collection cannot keep up and starts "halt-the-world" collecting for some reason.
Attach with jvisualvm in the JDK when starting and have a look at the collected data when the performance drops.
The problem you'r describing is quite typical but general as well. Causes can range from memory leaks, resource contention etcetera to bad GC policies and heap/PermGen-space allocation. To point out exact problems with your application, you need to profile it (I am aware of tools like Yourkit and JProfiler). If you profile your application wisely, only some application cycles would reveal the problems otherwise profiling isn't very easy itself.
In a similar situation, I have coded a simple profiling code myself. Basically I used a ThreadLocal that has a "StopWatch" (based on a LinkedHashMap) in it, and I then insert code like this into various points of the application: watch.time("OperationX");
then after the thread finishes a task, I'd call watch.logTime(); and the class would write a log that looks like this: [DEBUG] StopWatch time:Stuff=0, AnotherEvent=102, OperationX=150
After this I wrote a simple parser that generates CSV out from this log (per code path). The best thing you can do is to create a histogram (can be easily done using excel). Averages, medium and even mode can fool you.. I highly recommend to create a histogram.
Together with this histogram, you can create line graphs using average/medium/mode (which ever represents data best, you can determine this from the histogram).
This way, you can be 100% sure exactly what operation is taking time. If you can't determine the culprit, binary search is your friend (fine grain the events).
Might sound really primitive, but works. Also, if you make a library out of it, you can use it in any project. It's also cool because you can easily turn it on in production as well..
Aside from the GC that others have mentioned, Try taking thread dumps every 5-10 seconds for about 30 seconds during your slow down. There could be a case where DB calls, Web Service, or some other dependency becomes slow. If you take a look at the tread dumps you will be able to see threads which don't appear to move, and you could narrow your culprit that way.
From the GC stand point, do you monitor your CPU usage during these times? If the GC is running frequently you will see a jump in your overall CPU usage.
If only this was a Solaris box, prstat would be your friend.
For acute issues like this a quick jstack <pid> should quickly point out the problem area. Probably no need to get all fancy on it.
If I had to guess, I'd say Hotspot jumped in and tightly optimised some badly written code. Netbeans grinds to a halt where it uses a WeakHashMap with newly created objects to cache file data. When optimised, the entries can be removed from the map straight after being added. Obviously, if the cache is being relied upon, much file activity follows. You probably wont see the drive light up, because it'll all be cached by the OS.

How to investigate excessive java garbage collection

I have a Tomcat instance which is exhibiting the following behaviour:
Accept a single http incoming request.
Issue one request to a backend server and get back about 400kb of XML.
Pass through this XML and transform it into about 400kb of JSON.
Return the JSON response.
The problem is that in the course of handling the 400k request my webapp generates about 100mb of garbage which fills up the Eden space and triggers a young generation collection.
I have tried to use the built in java hprof functionality to do allocation sites profiling but Tomcat didn't seem to start up properly with that in place. It is possible that I was just a bit impatient as I imagine memory allocation profiling has a high overhead and therefore tomcat startup might take a long time
What are the best tools to use to do java memory profiling of very young objects/garbage? I can't use heap dumps because the objects I'm interested in are garbage.
As to the actual problem: XML parsing can be very memory hogging when using a DOM based parser. Consider using a SAX or binary XML based parser (VTD-XML is a Java API based on that).
Actually, if the XML->JSON mapping is pure 1:1, then you can also consider to just read the XML and write the JSON realtime line by line using a little stack.
Back to the question: I suggest to use VisualVM for this. You can find here a blog article how to get it to work with Tomcat.
You can use the profiler in jvisualvm in the JDK to do memory profiling.
Also have a look at Templates to cache the XSLT transformer.
http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/transform/Templates.html
You should be able to get heap dumps to work anyway by debugging the app, placing breakpoints at key points of the code and creating a heap dump while the app is paused at each breakpoint.
You might want to try LambdaProbe, which is a profiler for Tomcat.
It supports the following:
Overview
Lambda Probe (formerly Tomcat Probe) is a self sufficient web application, which helps to visualize various parameters of Apache Tomcat instance in real time. Lambda Probe is designed to work specifically with Tomcat so it is able to access far more information that is normally available to JMX agents. Here is a list of features available through Lambda Probe:
New! Comprehensive JVM memory usage
monitor.
JBoss compatibility
Display of deployed applications,
their status, session count, session
object count, context object count,
datasource usage etc.
Start, stop, restart, deploy and
updeploy of applications
Ability to view deployed JSP files
Ability to compile all or selected
JSP files at any time.
Ability to pre-compile JSP files on
application deployment.
New! Ability to view auto-generated
JSP servlets
Display of list of sessions for a
particular application
Display of session attributes and
their values for a particular
application. Ability to remove
session attributes.
Ability to view application context
attributes and their values.
Ability to expire selected sessions
Graphical display of datasource
details including maximum number of
connections, number of busy
connections and configuration details
New! Ability to group datasource
properties by URL to help visualizing
impact on the databases
Ability to reset data sources in case
of applications leaking connection
Display of system information
including System.properties, memory
usage bar and OS details
Display of JK connector status
including the list of requests
pending execution
Real-time connector usage charts and
statistics.
Real-time cluster monitoring and
clulster traffic charts
New! Real time OS memory usage, swap
usage and CPU utilisation monitoring
Ability to show information about log
files and download selected files
Ability to tail log files in real
time from a browser.
Ability to interrupt execution of
"hang" requests without server
restart
New! Ability to restart Tomcat/JVM
via Java Serview Wrapper.
Availability "Quick check"
Support for DBCP, C3P0 and Oracle
datasources
Support for Tomcat 5.0.x and 5.5.x
Support for Java 1.4 and Java 1.5
https://github.com/mchr3k/org.inmemprofiler/wiki (http://mchr3k.github.io/org.inmemprofiler/)
InMemProfiler can be used to identify which objects are collected after a very short time.

Java Applet and Browser Freeze

Is there a best practice for avoiding a browser freeze when loading an applet?
For my precise needs, the applet needs to be loaded when the web application is initialized, and is not a visual component (will be in a hidden div or hidden iframe).
As a reference, here is an old bug on SUN's side.
Essentially, no there is not. Read the evaluation section of the bug you linked to. The issue is one of startup time for the JVM. About the best you can do is to keep the applet small so that it will load quickly. However, the browser freeze is happening because the browser has to wait for the VM to start. You can't ever drop that time to 0, so a short freeze is unavoidable.
1.6.0u10 will help greatly with most browsers, but with the older vms there is no way to avoid the browser freeze.

Categories