There are 2 classes with the same name
<div class="website text:middle"> A</div>
<div class="website text:middle"> 1</div>
How to get A and 1? I tried using getElementById with :eq(0) and it gives out null
Method getElementById queries for elements with a specified id, not class; I'm not sure what you were trying to query with :eq(0) either.
Try:
// String html = ...
Document doc = Jsoup.parse(html);
List<String> result = doc.getElementsByClass("text:middle").eachText();
// result = ["A", "1"]
EDIT
You can query for elements that match multiple classes! See Jsoup select div having multiple classes.
However, a colon (:) is a special character in css and needs to be escaped when it appears as part of a class name in a selector query. I don't think that jsoup currently supports this and simply treats everything after a colon as a pseudo-class.
To add to Janez's correct answer - while jsoup's CSS selector (currently) doesn't support escaping a : character in the class name, there are other ways to get it to work if you want to use the select() method instead of getElementsByXXX -- e.g. if you want to combine selectors in one call:
Elements divs = doc.select("div[class=website text:middle]");
That will find div elements with the literal attribute class="website text:middle". Example.
Or:
Elements divs = doc.select("div[class~=text:middle]");
That finds elements with the class attribute that matches the regex /text:middle/. Example
For the presented data though, I think think the getElementsByClass() DOM method is the way to go and the most general. I just wanted to show a couple alternatives for other cases.
document.querySelectorAll(".website")[0] // 0 is child index
you should use querySelector it is fully supported by every browser
check this for support details support
Related
I'm using selenium (with java) to search all the classNames of a Page and then use Regex to only save the className(s) which have "insignia" in them.
I tried using the below code with regex to search for classNames which a mention of "insignia" in them but it didn't return any result.
System.out.println(driver.findElements(By.className(".*\\binsignia\\b.*")).get(1).getAttribute("src"));
You can't use Regex inside a locator string. You can use a CSS selector and find all elements that contain "insignia" in the class name.
System.out.println(driver.findElements(By.cssSelector("[class*='insignia']")).get(1).getAttribute("src"));
CSS selector reference
driver.findElements(By.xpath("//*[contains(#class,'.*\\binsignia\\b.*')]")
is going to return the webElements containing class name insignia
I'm using Robotframework to automate tests, it uses the Selenium2 Library and gives the opportunity to extend many libraries (Java, Python, AngularJS, etc.).
Here's my question.
Is there a way to get all the xpath of elements displayed on a page that match a certain criteria?
Thanks in Advance,
I'll answer the question the 2 ways I understand it - at least one should be the looked for ;)
Presuming the xpath is //div - e.g. "all div elements"
1) To count how many matching elements are there:
${count}= Get Matching Xpath Count //div # note there's no xpath= prefix
As pointed out in the comments, the return value of Get Matching Xpath Count is of a String type, so if you want to use it for some numerical comparisons, you'd better cast it to int:
${count}= Convert To Integer ${count}
2) To get each element matching that xpath, and do something with it
${matched elements}= Get Webelements xpath=//div
:FOR ${element} IN #{matched elements}
\ ${text}= Get Text ${element} # will get the text of each matched node
\ Log ${text}
I would like to find any WebElement based on text using XPath.
WebElement that I am interested to find,
Its HTML,
Basically my WebElement that I am trying to retrieve by Text contains an input element.
I currently use,
driver.findElement(By.xpath("//*[normalize-space(text()) = 'Own Hotel']"));
which does not find the WebElement above, but it usually works to retrieve all other web elements.
Even,
By.xpath("//*[contains(text(),'Own Hotel')]")
did not give me any results. Although I am interested in exact text match.
I am looking for a way to find web element by text immaterial of the elements that are present inside the web element. If text matches, it should return the WebElement.
Thanks!
It seems text is wrapped inside a label and not input. Try this
driver.findElement(By.xpath(".//label[text()[normalize-space() = 'Own Hotel']]"));
There is nice explanation about this xpath pattern here
In the HTML below:
The innerText Own Hotel within the <input> node contains a lot of white-space characters in the beginning as well at the end. Due to the presence of these leading and trailing white-space characters you can't use the location path text() as:
text() selects all text node children of the context node
As an alternative, you need to use the String Function string normalize-space(string?) as follows:
driver.findElement(By.xpath("//*[normalize-space()='Own Hotel']"));
However, it would a better idea to make your search a bit more granular adding the tagName and preferably an unque attribute as follows:
Using tagName and normalize-space():
driver.findElement(By.xpath("//input[normalize-space()='Own Hotel']"));
Using tagName, and normalize-space():
driver.findElement(By.xpath("//input[#name='ownHotel' and normalize-space()='Own Hotel']"));
References
you can find a couple of relevant discussions using normalize-space() in:
How to click on a link with trailing white-space characters on a web page using Selenium?
How to locate and click the element when the innerText contains leading and trailing white-space characters using Selenium and Python
How to click on the button when the textContext contains leading and trailing white-space characters using Selenium and Python
I would like to use getText() for one XPath, need text what is there.
//span(contains(#style,'display:none'))
XPath is working tested in firebug, I've tried getText, getAttribute, so far no luck
It's a little hard to say without the exact HTML, which you have not specified in your question...
To begin with, you need to change this:
"//span(contains(#style,'display:none'))"
To this:
"//span[contains(#style,'display:none')]"
UPDATE:
Alternatively, since the span element is not visible, you might be able to do it with:
String innerHTML = elem.getAttribute("innerHTML");
Where elem is the parent node of the span element.
Then, in order to get the actual text, you will need to parse the innerHTML string.
Because the element is invisible (it has display:none), Selenium cannot natively interact with it. You need cast your driver to JavascriptExecutor, then execute the following javascript:
$x("//span(contains(#style,'display:none'))")[0].text
The [0] returns the 1st element returned by the xpath.
This will return the inner text of the element.
I'm writing an app for a friend but I ran into a problem, the website has these
<span style="display:none">&0000000000000217000000</span>
And we have no idea even what they are, but I need them removed because my app is outputting their value.
Is there any way I can check to see if this is in the Elements and remove it? I have a for-each loop parsing however I cant figure out how to effectively remove this element.
thanks
If you want to remove those spans completely based on the style attribute, try this code:
String html = "<span style=\"display:none\">&0000000000000217000000</span>";
html += "<span style=\"display:none\">&1111111111111111111111111</span>";
html += "<p>Test paragraph should not be removed</p>";
Document doc = Jsoup.parse(html);
doc.select("span[style*=display:none]").remove();
System.out.println(doc);
Here is the output:
<html>
<head></head>
<body>
<p>Test paragraph should not be removed</p>
</body>
</html>
Just try this:
//Assuming you have all the data in a Document called doc:
String cleanData = doc.select("query").text();
The .text(); method will clean all html tags and substitute all encoding, with human readable content. Oh yeah, and then there's the method ownText(); that might help as well. I can't say which will best fit your purposes.
You can use JSOUP to access the innerHTML of the elements, remove the escaped characters, and replace the innerHTML:
Elements elements = doc.select('span');
for(Element e : elements) {
e.html( e.html().replaceAll("&","") );
}
In the above example, get a collection of all of the elements, using the selector for all of the elements that contain the offending character. Afterwards, replace the & with the empty string or whatever character you wish.
Additionally, you should know that & is the escape code for the & character. Without escaping & characters, you may have HTML validation issues. In your case, without additional information, I'm assuming you just really want to eliminate them. If not, this will help get you started. Good luck!
If you need to remove the trailing numbers:
// eliminate ampersand and all trailing numbers
e.html( e.html().replaceAll("&[0-9]*","") );
For more information on regular expressions, see the Javadocs on Regex Pattern.