html code
<div title class="Example">
<span>first div</span> <!---->
<span class="second div">second span</span></div>
Java code
Document doc = Jsoup.connect("example.com").get();
Elements elemenx = doc.select("div.Example span");
for (Element e: elemenx) {
System.out.println(e.text());
}
How i can get only the first span
I found solution, need to add nth-child
Elements elemenx = doc.select("div.Example span:nth-child(1)");
maybe someone will be useful
Related
I need to know whether we can find the existence of a pseudo element like ::after and ::before
My aim is just to return true or false if it is present.
However it cannot be done using:
browser.driver.findElements(by.id('id')).size != 0
or
return !driver.findElements(by).isEmpty();
becasue they are psuedo elements and cannot be located through any CS or XPATH locators
Here is my HTML having ::after
<div class="parent-class">
<span class="child-class">Archive
::after
</span>
::after
</div>
Here is my HTML without having ::after
<div class="parent-class">
<span class="child-class">Archive
::after
</span>
</div>
Note: I need to verify only the ::after in DIV tag but not in SPAN tag
Yes, the pseudo-elements cannot be located by paths or CSS locators.
However, as an alternative, you can extract the inner HTML of the parent element and validate if it contains the "::after" text.
There are two ways of doing this. For the above scenario,
WebElement element = browser.driver.findElement(By.className('parent-class'))
String innerHTMLText = element.getAttribute('innerHTML');
if (innerHTMLText.contains("::after")){
// Bingo !!
}
Or else
WebElement element = browser.driver.findElement(By.className('parent-class'))
JavascriptExecutor js = (JavascriptExecutor)driver;
String innerHTMLText = js.executeScript("return arguments[0].innerHTML;", element);
if (innerHTMLText.contains("::after")){
// Bingo !!
}
EDIT 1
If you need to verify if only the div tag is having the pseudo-element you can get the span tag's HTML, get parent tag's HTML, remove span tag inner HTML from parent tag's HTML. Verify.
String divHTMLText = browser.driver.findElement(By.className('parent-class')).getAttribute('innerHTML');
String spanHTMLText = browser.driver.findElement(By.className('child-class')).getAttribute('innerHTML');
// replace all the whitespaces first for good measure
divHTMLText = divHTMLText.replaceAll("\\s+","")
spanHTMLText = spanHTMLText.replaceAll("\\s+","")
// replace the child html from parent's html with empty. which leaves us with the parent html code.
String divOnlyHTML = divHTMLText.replace(spanHTMLText, "");
if (divOnlyHTML.contains("::after")){
// Bingo !!
}
I have the following HTML:
<html>
<body>
...
<h2> Blah Blah 1</h2>
<p>blah blah</p>
<div>
<div>
<table>
<tbody>
<tr><th>Col 1 Header</th><th>Col 2 Header</th></tr>
<tr><td>Line 1.1 Value</td><td>Line 2.1 Header</td></tr>
<tr><td>Line 2.1 Value</td><td>Line 2.2 Value</td></tr>
</tbody>
</table>
</div>
</div>
<div>
<div>
<table>
<tbody>
<tr><th>Col 1 Header T2</th><th>Col 2 Header T2</th></tr>
<tr><td>Line 1.1 Value T2</td><td>Line 2.1 Header T2</td></tr>
<tr><td>Line 2.1 Value T2</td><td>Line 2.2 Value T2</td></tr>
</tbody>
</table>
</div>
</div>
<h2> Blah Blah 2</h2>
<div>
<div>
<table>
<tbody>
<tr><th>XCol 1 Header</th><th>XCol 2 Header</th></tr>
<tr><td>XLine 1.1 Value</td><td>XLine 2.1 Header</td></tr>
<tr><td>XLine 2.1 Value</td><td>XLine 2.2 Value</td></tr>
</tbody>
</table>
</div>
</div>
<p>blah blah</p>
<div>
<div>
<table>
<tbody>
<tr><th>XCol 1 Header T2</th><th>XCol 2 Header T2</th></tr>
<tr><td>XLine 1.1 Value T2</td><td>XLine 2.1 Header T2</td></tr>
<tr><td>XLine 2.1 Value T2</td><td>XLine 2.2 Value T2</td></tr>
</tbody>
</table>
</div>
</div>
</body>
</html>
I would like to extract the 2nd DIV following an h2 tag that contains a given text.
As you may notice in the first and second div the p tags are not in the same position.
To extract the DIV following the first h2, the below formula would work:
h2:contains(Blah 1) + p + div +div
But to extract the 2nd, replacing "Blah 1" with "Blah 2" would not work as the ""p"" tag is located elsewhere , so a static selector would be :
h2:contains(Blah 2) + div + p +div
And what I need is a single selector formula where changing the text would make it work, wherever the p blocks may be
I tried several ways :
like ... The selector nth-of-type would not work either, because I know the position of the DIV only wrt the h2 that is not father of DIV but a preceding sibling ...
Help please
I have two ideas how to achieve this.
The first one is to remove every <p> and then you will only have to select "h2:contains(" + text + ")+div+div". Be careful and use it only when you're sure your <div> doesn't contain any <p>. Otherwise it will lack some content.
public void execute1(String html) {
Document doc = Jsoup.parse(html);
// first approach: remove every <p> to simplify document
Elements paragraphs = doc.select("p");
for (Element paragraph : paragraphs) {
paragraph.remove();
}
// then one selector will return what you want in both cases
System.out.println(selectSecondDivAfterH2WithText(doc, "Blah 1"));
System.out.println(selectSecondDivAfterH2WithText(doc, "Blah 2"));
}
private Element selectSecondDivAfterH2WithText(Document doc, String text) {
return doc.select("h2:contains(" + text + ")+div+div").first();
}
The second approach would be to iterate over siblings of "h2:contains(" + text+ ")" and "manually" find second <div> ignoring anything else. It's better because it doesn't destroy the original document and it will skip any number of <p> elements.
public void execute2(String html) {
Document doc = Jsoup.parse(html);
System.out.println(selectSecondDivAfterH2WithText2(doc, "Blah 1"));
System.out.println(selectSecondDivAfterH2WithText2(doc, "Blah 2"));
}
private Element selectSecondDivAfterH2WithText2(Document doc, String text) {
int counter = 2;
// find h2 with given text
Element h2 = doc.select("h2:contains(" + text + ")").first();
// select every sibling after this h2 element
Elements siblings = h2.nextElementSiblings();
// loop over them
for (Element sibling : siblings) {
// skip everything that's not a div
if (sibling.tagName().equals("div")) {
// count how many divs left to skip
counter--;
if (counter == 0) {
// return when found nth div
return sibling;
}
}
}
return null;
}
I had also third idea to use "h2:contains(" + text + ")~div:nth-of-type(2)". It works for the first case, but fails for the second one probably because there's a <p> between the divs.
A simple way to do this is by using the comma (,) query operator which does an OR between the selectors. So you can combine the two variations of where the P tag falls.
h2:contains(Blah 2) + div ~ div, h2:contains(Blah 2) ~ div + div
Here's an example on the try.jsoup playground.
HTML code is as follows
<div class="a-row">
<a class="a-link-normal" title="1.0 out of 5 stars" href="/gp/customer-reviews/RBDVABUKMPJY8/ref=cm_cr_arp_d_rvw_ttl?ie=UTF8&ASIN=B071NZZHF9">
<i class="a-icon a-icon-star a-star-1 review-rating" data-hook="review-star-rating">
<span class="a-icon-alt">1.0 out of 5 stars</span>
</i>
</a>
<span class="a-letter-space"/>
<a class="a-size-base a-link-normal review-title a-color-base a-text-bold" data-hook="review-title" href="/gp/customer-reviews/RBDVABUKMPJY8/ref=cm_cr_arp_d_rvw_ttl?ie=UTF8&ASIN=B071NZZHF9">One Star</a>
</div>
I want to call 1.0 out of 5 stars in the span class using the parent class='a-row'.
Can someone help on how can we call as it has to be called using partiallinktext method using only partiallinktext= out of 5 stars.
The way through which i got the output is as follows:
List<WebElement> rstar = dr.findElements(By.xpath("//*[#id='cm_cr-review_list']//div[#class='a-row']//a[#class='a-link-normal']//i"));
String c;
for(WebElement erstar : rstar) {
c=erstar.getAttribute("innerText");
System.out.println(c);
Thanks for the help #AliAzam :)
First look for the Span elem like:
WebElement spanElem = driver.findElement(By.className("a-row")).findElement(By.tagName("span"));
Then use below:
spanElem.findElement(By.partialLinkText("out of 5 stars"));
Another Solution:
List<WebElement> allParents = driver.findElements(By.className("a-row"));
for (WebElement elem : allParents) {
WebElement spanElem = elem.findElement(By.tagName("span"));
//System.out.println(spanElem.getText());
spanElem.findElement(By.partialLinkText("out of 5 stars"));
//System.out.println(spanElem.findElement(By.partialLinkText("out of 5 stars")).getText());
}
OR
List<WebElement> rstar = dr.findElements(By.className("a-row"));
for(WebElement erstar : rstar)
{
erstar.findElement(By.partialLinkText("out of 5 stars"));
String c = erstar.getText();
System.out.println(c);
}
Hopefully it resolves your issues.
<div class="Class-feedbacks">
<div class="grading class2">
<div itemtype="http://xx.edu/grading" itemscope="" itemprop="studentgrading">
<div class="rating">
<img class="passportphoto" width="1500" height="20" src="http://greg.png" >
<meta content="4.0" itemprop="gradingvalue">
</div>
</div>
<meta content="2012-09-08" itemprop="gradePublished">
<span class="date smaller">9/8/2012</span>
</div>
<p class="review_comment feedback" itemprop="description">Greg is one the smart person in his batch</p>
</div>
I want to print:
date: 2012-09-08
Feedback : Greg is one the smart person in his batch
I was able to use this as suggested at - Jsoup getting a hyperlink from li
The doc.select(div div divn li ui ...) and get the class feedback.
How should I use the select command to get the values of the above values?
To get the value of an attribute, use the attr method. E.g.
Elements elements = doc.select("meta");
for(Element e: elements)
System.out.println(e.attr("content"));
In one single select ...have you tried the comma Combinator "," ?
http://jsoup.org/apidocs/org/jsoup/select/Selector.html
Elements elmts = doc.select("div.Class-feedbacks meta, p")
Element elmtDate = elmts.get(0);
System.out.println("date: " + elmtDate.attr("content"));
Element elmtParag = elmts.get(1);
System.out.println("Feedback: " + elmtParag.text());
You should get back 2 elements in your list the date and the feedback after the select.
This is an old question and I might be late, but if anyone else wants to know how to do this easily, the below code will be helpful.
Document doc = Jsoup.parse(html);
// We select the meta tag whose itemprop property has value 'gradePublished'
String date = doc.select("meta[itemprop=gradePublished]").attr("content");
System.out.println("date: "+date);
// Now we select the text inside the p tag with itemprop value 'description'
String feedback = doc.select("p[itemprop=description]").text();
System.out.println("Feedback: "+feedback);
I'm having trouble selecting links in my html. Here's the html I have:
<div class=first>
<a href=www.test1.com>test1</a>
<div class=nope>
<a href=www.test2.com>test2</a>
<a href=www.test3.com>test3</a>
<a href=www.test4.com>test4</a>
</div>
</div>
What I want to do is pull the URLs:
www.test2.com
www.test3.com
www.test4.com
I have tried a lot of diferent .select and .not combinations but I just can't figure it out. Can anyone point out what it is I'm doing wrong?
String url = "<div class=first><a href=www.test1.com>test1</a>One<div class=nope><a href=www.test2.com>test2</a>Two</div></div><div class=second><a href=www.test3.com>test3</a></div>";
Document doc = Jsoup.parse(url);
Elements divs = doc.select("div a[href]").not(".first.nope a[href]");
System.out.println(divs);
Document doc = Jsoup.parse("your html code/url ");
Elements links = doc.select("div.nope a").first();
for (Element link : links) {
System.out.println(link.attr("href"));
I would do it a little different:
Elements elements = doc.select("div.nope").select("a[href]");
for (Element element : elements) {
System.out.println(element.attr("href"));
}
Elements data=doc.getElementsByClass("nope")
for(Element d:data)
{
String yourData= d.tagName("href").toString();
}