What is the most efficient selector to use with findElement()? - java

When working with Selenium web testing, there are a few ways to identify WebElements.
In my experience, I have used the following selectors:
Class Name - By.className()
CSS Selector - By.cssSelector()
ID - By.id()
Link Text - By.linkText()
Name - By.name()
Tag Name - By.tagName()
XPath - By.xpath()
Obviously, when only one option is usable to locate an element, we must use that one, but when multiple methods may be used (ex: the div below), how should it be determined which method to use? Are there selectors that are more efficient than others? Are there some that are more durable?
<div class="class" id="id" name="name">Here's a div</div>

Just for s&gs...
I timed each of the identifier methods finding the div above five separate times and averaged the time taken to find the element.
WebDriver driver = new FirefoxDriver();
driver.get("file://<Path>/div.html");
long starttime = System.currentTimeMillis();
//driver.findElement(By.className("class"));
//driver.findElement(By.cssSelector("html body div"));
//driver.findElement(By.id("id"));
//driver.findElement(By.name("name"));
//driver.findElement(By.tagName("div"));
//driver.findElement(By.xpath("/html/body/div"));
long stoptime = System.currentTimeMillis();
System.out.println(stoptime-starttime + " milliseconds");
driver.quit();
They are sorted below by average run time..
CssSelector: (796ms + 430ms + 258ms + 408ms + 694ms) / 5 = ~517.2ms
ClassName: (670ms + 453ms + 812ms + 415ms + 474ms) / 5 = ~564.8ms
Name: (342ms + 901ms + 542ms + 847ms + 393ms) / 5 = ~605ms
ID: (888ms + 700ms + 431ms + 550ms + 501ms) / 5 = ~614ms
Xpath: (835ms + 770ms + 415ms + 491ms + 852ms) / 5 = ~672.6ms
TagName: (998ms + 832ms + 1278ms + 227ms + 648ms) / 5 = ~796.6ms
After reading #JeffC 's answer I decided to compare By.cssSelector() with classname, tagname, and id as the search terms. Again, results are below..
WebDriver driver = new FirefoxDriver();
driver.get("file://<Path>/div.html");
long starttime = System.currentTimeMillis();
//driver.findElement(By.cssSelector(".class"));
//driver.findElement(By.className("class"));
//driver.findElement(By.cssSelector("#id"));
//driver.findElement(By.id("id"));
//driver.findElement(By.cssSelector("div"));
//driver.findElement(By.tagName("div"));
long stoptime = System.currentTimeMillis();
System.out.println(stoptime-starttime + " milliseconds");
driver.quit();
By.cssSelector(".class"): (327ms + 165ms + 166ms + 282ms + 55ms) / 5 = ~199ms
By.className("class"): (338ms + 801ms + 529ms + 804ms + 281ms) / 5 = ~550ms
By.cssSelector("#id"): (58ms + 818ms + 261ms + 51ms + 72ms) / 5 = ~252ms
By.id("id") - (820ms + 543ms + 112ms + 434ms + 738ms) / 5 = ~529ms
By.cssSelector("div"): (594ms + 845ms + 455ms + 369ms + 173ms) / 5 = ~487ms
By.tagName("div"): (825ms + 843ms + 715ms + 629ms + 1008ms) / 5 = ~804ms
From this, it seems like you should use css selectors for just about everything you can!

In my experience, I use these locators in this order:
id
linkText/partialLinkText
CSS Selector
XPath
The others: class name, tag name, name, etc. can all be found using CSS selectors. I rarely need a single class name so I prefer CSS selector so that I can use more than one class but also specify a tag name to make it more specific and less likely to break. Tag name is rarely used... unless we are talking about a TABLE or TR or TD tags and those can all be done with CSS Selectors. I generally find that tags with a name also have an id so I prefer id.
I recently found, as you did in your answer, that CSS selector is the fastest. This makes sense because Selenium is using the browser to do the searches and CSS selectors are so common that the different browsers have optimized performance for their use.
linkText/partialLinkText are very specialized so I don't really count them. I use them when I can and it makes sense. I have thought about just using By.cssSelector("#someId") but I don't think it really makes that much of a difference and By.id() is just a little more obvious when using an Id.
I rarely use XPath and only when I can't accomplish it with other locators, e.g. in the case of text from a tag or some weird child/descendant thing that I can't do with CSS selectors. I found (and read) that XPath support is varied by browser and it's slower so I avoid it unless absolutely necessary... and I have found you can locate 99% of elements with #1-3.
ids should be the most durable. LinkText and partialLinkText are probably fairly durable, depending on the page. Classes applied and the structure of the HTML that you would use with CSS selectors are probably the most likely to change with a page update. It really depends on the size of the update as to whether a partial portion of a page is changed. CSS Selectors and XPath would both (generally) be affected by these sorts of changes.
In the end... as long as you aren't scraping the page for hundreds of elements, one page transition is likely to be WAY more significant than a few hundred ms of difference between locator methods.

I'd prioritise selectors like this:
ID - By.id()
Name - By.name()
CSS Selector - By.cssSelector()
XPath - By.xpath()
Tag Name - By.tagName()
Link Text - By.linkText()
However, uniq IDs and Names are not always exist. Also you can use CSS Selectors to locate by ID #element_id or by Name [name=element_name] or by ClassName .element_class, so might be good idea to use CSS Selectors instead ID, Name and ClassName. Css is faster than xPath, so it's better to use it where possible. xPath is good for difficult element locators which CSS Selectors can't find. I also wouldn't use Tag Name and Link Text as you can write the same with xPath(For Link Text //a[text()='link_text'], Tag Name //div) or CSS Selectors(For Tag Name div ).

Locators should be descriptive, unique, and unlikely to change. So my priority is as follows:
ID - You'll get the exact element you want and it can be descriptive and won't be changed by mistake.
Class - Very descriptive, probably unique in context of parent element.
CSS - Better performence than XPath - see Dave Haeffner's great benchmark.
XPath - Has capabilities that CSS doesn't like Axis e.g. ::parent and functions like contains().
I would avoid LinkText and TagName as they tend to cause unexpected failures due to very generic filtering.
Note on CSS and XPath: Using these with something like //div/li[1]/*/span[2] or div > li:nth-child(1) should also be avoided as they depend on the rendering and prone to changes.

Related

GraphStream styling without hideous strings/absolute file paths?

In the GraphStream documentation, it is stated that we can style a graph by the following ways:
This attribute must be stored on a graph and takes as value either the
style sheet itself as a string of characters or the URL of a file,
local, distant or packaged in a jar.
You can for example change the background color of the graph using a
style sheet given as a string this way:
graph.addAttribute("ui.stylesheet", "graph { fill-color: red; }");
But you can also specify an URL:
graph.addAttribute("ui.stylehseet",
"url('http://www.deep.in/the/site/mystylesheet')");
Or:
graph.addAttribute("ui.stylesheet",
"url(file:///somewhere/over/the/rainbow/stylesheet')");
However, I experimented with this, and came to the conclusion that GraphStream only supports absolute file paths for this attribute. This is an issue since I will ultimately be exporting the application as a JAR file, and although I have found that you can circumvent the issue by doing something like:
ClassLoader.getSystemClassLoader().getResource(".").getPath(‌​);
there is uncertainty associated with this method (i.e. it may not work on certain machines, such as Linux machines (?)).
In which case, my current hack is to store the 'stylesheet' as a long string, something like this:
private static final String GRAPH_DISPLAY_STYLESHEET =
"graph { fill-color: white; }" +
"node {" +
"fill-color: black;" +
"shape: rounded-box;" +
"shadow-mode: plain;" +
"shadow-color: #C8C8C8;" +
"shadow-width: 4;" +
"shadow-offset: 4, -4;" +
"text-background-mode: rounded-box;" +
"text-padding: 5;" +
"text-background-color: black;" +
"text-color: white;" +
"text-size: 20;" +
"size-mode: fit;}" +
"edge { fill-color: black; }";
which is frankly very unsightly.
Does anyone have any ideas about how to improve this situation? Thanks in advance!
Using an embedded-resource and getResource() is the way to go. To avoid file system vagaries, try one of these approaches:
Use one of the approaches adduced here to convert each resource to a String suitable for addAttribute().
Load an instance of java.util.Properties with your styles, as shown here and here, so that each style is available as a String suitable for addAttribute().

How do I get the tf and idf score from a Solr query?

Following Solr documentations (https://cwiki.apache.org/confluence/display/solr/Function+Queries and others) I should just put idf(fieldname, 'term') as I do with termfreq(fieldname, 'term') in the field list. However, whenever I try this I get an exception as:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request
By looking at the logs I could find:
null:java.lang.UnsupportedOperationException: requires a TFIDFSimilarity (such as ClassicSimilarity)
And I have no idea what those are. Also when I use debugQuery=on it shows me, along with a lot of other things, the idf value for the document as:
4.406719 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))
What should I do to fix these errors or to get desired tf and idf value for a term?
You need to add following line in your "managed-schema" where similarity tag is commented at the end of the schema
<similarity class="solr.ClassicSimilarityFactory"/>
I had run into the same issue and found this chrome extension -
https://chrome.google.com/webstore/detail/solr-query-debugger/gmpkeiamnmccifccnbfljffkcnacmmdl?hl=en
It is really helpful in breaking down "explain" and will display the value of idf for you.

How to speed up page parsing in Selenium

What can I do in case if I load the page in Selenium and then I have to do like 100 different parsing requests to this page?
At this moment I use different driver.findElement(By...) and the problem is that every time it is a http (get/post) request from java into selenium. From this case one simple page parsing costs me like 30+ seconds (too much).
I think that I must get source code (driver.getPageSource()) from first request and then parse this string locally (my page does not change while I parse it).
Can I build some kind of HTML object from this string to keep working with WebElement requests?
Do I have to use another lib to build HTML object? (for example - jsoup) In this case I will have to rebuild my parsing requests from webelement's and XPath.
Anything else?
When you call findElement, there is no need for Selenium to parse the page to find the element. The parsing of the HTML happens when the page is loaded. Some further parsing may happen due to JavaScript modifications to the page (like when doing element.innerHTML += ...). What Selenium does is query the DOM with methods like .getElementsByClassName, .querySelector, etc. This being said, if your browser is loaded on a remote machine, things can slow down. Even locally, if you are doing a huge amount of round-trip to between your Selenium script and the browser, it can impact the script's speed quite a bit. What can you do?
What I prefer to do when I have a lot of queries to do on a page is to use .executeScript to do the work on the browser side. This can reduce dozens of queries to a single one. For instance:
List<WebElement> elements = (List<WebElement>) ((JavascriptExecutor) driver)
.executeScript(
"var elements = document.getElementsByClassName('foo');" +
"return Array.prototype.filter.call(elements, function (el) {" +
" return el.attributes.whatever.value === 'something';" +
"});");
(I've not run the code above. Watch out for typos!)
In this example, you'd get a list of all elements of class foo that have an attribute named whatever which has a value equal to something. (The Array.prototype.filter.call rigmarole is because .getElementsByClassName returns something that behaves like an Array but which is not an Array so it does not have a .filter method.)
Parsing locally is an option if you know that the page won't change as you examine it. You should get the page's source by using something like:
String html = (String) ((JavascriptExecutor) driver).executeScript(
"return document.documentElement.outerHTML");
By doing this, you see the page exactly in the way the browser interpreted it. You will have to use something else than Selenium to parse the HTML.
Maybe try evaluating your elements only when you try to use them?
I dont know about the Java equivalent, but in C# you could do something similar to the following, which would only look for the element when it is used:
private static readonly By UsernameSelector = By.Name("username");
private IWebElement UsernameInputElement
{
get { return Driver.FindElement(UsernameSelector); }
}

JSoup not translating ampersand in links in html

In JSoup the following test case should pass, it is not.
#Test
public void shouldPrintHrefCorrectly(){
String content= "<li>Good<ul><li><a href=\"article.php?boid=1865&sid=53&mid=1\">" +
"Boss</a></li><li><a href=\"article.php?boid=186&sid=53&mid=1\">" +
"heavent</a></li><li><a href=\"article.php?boid=167&sid=53&mid=1\">" +
"hellos</a></li><li><a href=\"article.php?boid=181&sid=53&mid=1\">" +
"Mr.Jackson!</a></li>";
Document document = Jsoup.parse(content, "http://www.google.co.in/");
Elements links = document.select("a[href^=article]");
Iterator<Element> iterator = links.iterator();
List<String> urls = new ArrayList<String>();
while(iterator.hasNext()){
urls.add(iterator.next().attr("href"));
}
Assert.assertTrue(urls.contains("article.php?boid=181&sid=53&mid=1"));
}
Could any of you please give me the reason as to why it is failing?
There are three problems:
You're asserting that there's a bovikatanid parameter is present, while it's actually called boid.
The HTML source is using & instead of & in the source. This is technically invalid.
Jsoup is parsing &mid as | somehow. It should have scanned until ;.
To fix #1, you have to do it yourself. To fix #2, you have to report this issue to the serveradmin in question (it's their fault, however, since the average browser is forgiving on this, I'd imagine that Google is doing this to save bandwidth). To fix #3, I've reported an issue to the Jsoup guy to see what he thinks about this.
Update: see, Jonathan (the Jsoup guy) has fixed it. It'll be there in the next release.

Best way to time something in Selenium

I'm writing some Selenium tests in Java, and I'm mostly trying to use verifications instead of assertions because the things I'm checking aren't very dependent so I don't want to abort if one little thing doesn't work. One of the things I'd like to keep an eye on is whether certain Drupal pages are taking forever to load. What's the best way to do that?
Little example of the pattern I'm using.
selenium.open("/m");
selenium.click("link=Android");
selenium.waitForPageToLoad("100000");
if (selenium.isTextPresent("Epocrates")) {
System.out.println(" Epocrates confirmed");
} else {
System.out.println("Epocrates failed");
}
Should I have two "waitForPagetoLoad" statements (say, 10000 and 100000) and if the desired text doesn't show up after the first one, print a statement? That seems clumsy. What I'd like to do is just a line like
if (timeToLoad>10000) System.out.println("Epocrates was slow");
And then keep going to check whether the text was present or not.
waitForPageToLoad will wait until the next page is loaded. So you can just do a start/end timer and do your if:
long start = System.currentTimeMillis();
selenium.waitForPageToLoad("100000");
long timeToLoad= (System.currentTimeMillis()-start);
if (timeToLoad>10000) System.out.println("Epocrates was slow");
Does your text load after the page is loaded? I mean, is the text inserted dynamically? Otherwise the text should be present as soon as the page was loaded.
selenium.isTextPresent
doesn't wait. It only checks the currently available page.
The best method to wait for something in Selenium is as follow:
Reporter.log("Waiting for element '" + locator + "' to appear.");
new Wait()
{
public boolean until()
{
return selenium.isElementPresent(locator);
}
}.wait("The element '" + locator + "' did not appear within "
+ timeout + " ms.", timeout);
The Waiter is part of selenium you only have to import it.
Also here is a framework that you can use. It's opensource, handles mostly everything and can be easily expanded.
https://sourceforge.net/projects/webauthfw/
Use it well and give us credit hehe. :)
Cheers,
Gergely.
In a Selenium integration test, I did it like so, using nano-time and converting to a double to get seconds:
long endTime = System.nanoTime();
long duration = (endTime - startTime);
Reporter.log("Duration was: " + ((double)duration / 1000000000.0) + " seconds.", true);
assertTrue( duration >=0 || duration <= 1000, "Test that duration of default implicit
timeout is less than 1 second, or nearly 0.");

Categories