I have an element like this :
<td> TextA <br/> TextB </td>
How can I extract TextA and TextB separately?
Several ways. That really depends on the document itself and whether the given HTML markup is consistent or not. In this particular example you could get the td's child nodes by Element#childNodes() and then test every node individually if it's a TextNode or not.
E.g.
Element td = getItSomehow();
for (Node child : td.childNodes()) {
if (child instanceof TextNode) {
System.out.println(((TextNode) child).text());
}
}
which results in
TextA
TextB
I think it would be nice if Jsoup offered a Element#textNodes() or something to get the child text nodes like as Element#children() does to get the child elements (which would have returned the <br /> element in your example).
Related
<td class="rich-tabpanel-content" style="; ">
<div id="page:form1:block1:j_id738" style="width:75%;">
<span xmlns="http://www.w3.org/1999/xhtml" id="page:form1:block1:out">
<b>
<span style="text-align:right;font-size:15px;color:black">Risk Value</span>
</b>
<b>
<span style="text-align:right;font-size:15px;color:black"></span>
<span style="text-align:right;font-size:14px;color:blue">Select a value for every risk.</span>
</b>
</span>
</div>
</td>
I'm using
String str = driver.findElement(By.cssSelector("div[id^='page:form1:block1:j_id738'] span ")).getText();
but it is fetching only the Risk Value but not the other text
You're looking for driver.findElements rather than driver.findElement.
try something like this:
List<WebElement> elements = driver.findElements(By.cssSelector("div[id^='page:form1:block1:j_id738'] span"));
then loop through this list and call getText() on each element to retrieve the text i.e:
if you're using Java:
for (WebElement element : elements) {
String data = element.getText();
// do something with data
}
if you're using C#:
foreach (WebElement element in elements) {
string data = element.getText();
// do something with data
}
There are multiple span tag inside your desired section. So you have to use findElements method instead of using findElement. findElements returns the list of elements residing inside the specified css or className. So your code should look like --
List<WebElement> elements = driver.findElements(By.cssSelector("div[id^='page:form1:block1:j_id738'] span"));
for (WebElement element : elements) {
System.out.println(element.getText());
}
I have a web page from which I have saved in an HtmlPage object. I applied an XPath and its result is being stored in a list.
List<?> items = null;
items = page.getByXPath("//div[contains(#class,'search-result-cards')]/div[contains(#class,'listContainer')]");
Now what I observed, is that when I iterate through these items, using HtmlElement, I get just the first line of the div tag which contains the class listContainer but not its child nodes. However, on using he.asXml() method, I get the complete information about the subnodes as well.
for(HtmlElement he : (List<HtmlElement>) items)
{
br.write("Printing just the element ::: "+he);
br.write(he.asXml());
}
Here, br is a BufferedWriter object which is being used to write the output to the file.
The issue is that I want all this information which is coming after I'm calling he.asXml() method in the HtmlElement object only. Is it possible? I tried typecasting directly a string to HtmlElement Object which didn't work. Can anyone please help?
Output
Printing just the element ::: HtmlDivision[<div class="listContainer" data-ptitle="3139847000" data-reactid="402">]
he.asXml() Output
<div class="listContainer" data-ptitle="3139847000" data-reactid="402">
<div class="imageContainer" data-reactid="403">
<div class="prodInfoContainer" data-reactid="406">
.
.
.
The dots represents these nodes keep on going, as the output is very large.
Let me know if any other information is needed that I may have not mentioned.
.toString() prints only the current DomElement, not the children.
You need to get the children, either by using XPath, something like:
List<HtmlElement> items = page.getByXPath("//div[contains(#class,'listContainer')]");
for (HtmlElement item : items) {
List<HtmlElement> children = item.getByXPath(".//div");
for (HtmlElement child : children) {
System.out.println(child);
}
}
Or
for (HtmlElement child : item.getHtmlElementDescendants()) {
System.out.println(child);
}
I am trying to read all the heading tag on a page and need to click only one heading tag named "dropdown". The sample structure of HTML is as follows
<div> <ul> <li>
<a href="submit_button_clicked.php">
<h2>Submit Button Clicked</h2>
<figure>
</a>
</li>
<li>
<a href="dropdown.php">
<h2>Dropdown</h2>
<figure>
What i did is to create a custom xpath and store it in List,then iterate through list using for loop but i am unable to /read/write the value of tag on console.
List l = ff.findElements(By.xpath("//div/ul/li/a/h2"));
To retrieve the text value of an element use:
element.getText();
In your case with your list it would look something like this:
for(WebElement element : l) {
System.out.println(element.getText());
}
Since you want to click on an element, it would be better to use an xpath such as the following:
ff.findElements(By.xpath("h2[text()='Dropdown']")).click();
To find and click the specific element you want. The above xpath selector looks for a h2 element with the exact text 'Dropdown' and then clicks on it.
Reading all <h2> tags can look something like:
List<WebElement> elements = ff.findElements(By.xpath("//h2"));
for(WebElement element : elements) {
System.out.println(element.getText()); // just to show that it prints text
}
Note that I defined list as List<WebElement> which is to avoid usage of raw types, and changed xpath to match any <h2>.
But when you need to click, usually you are required to click on parent <a> element, not on <h2> itself, i.e. the following should click on a correct link
ff.findElement(By.xpath("//a[#href='dropdown.php']")).click();
But if you want to find a link from header, in the above loop:
List<WebElement> elements = ff.findElements(By.xpath("//h2"));
for(WebElement element : elements) {
if("Download".equals(element.getText()) {
// get the parent <a> element and click on it
element.findElement(By.xpath("..")).click();
}
}
Hi please do it like below
WebDriver driver = new FirefoxDriver();
driver.get("http://www.seleniumhq.org");
driver.manage().timeouts().implicitlyWait(15, TimeUnit.SECONDS);
// Hi please do it like below ,take all H2 tag inside the list
List<WebElement> myH2Tags = driver.findElements(By.tagName("h2")); // you can put any tag name as per your requirement
for(int i=0;i<myH2Tags.size();i++){
System.out.println("Value of My H2 Tags are : " + myH2Tags.get(i).getText());
if(myH2Tags.get(i).getText().equals("Selenium News")){ // you can replace this with drop down value
myH2Tags.get(i).click();
}
// to avoid stale element exception you have to re identify the elements
myH2Tags = driver.findElements(By.tagName("h2"));
}
I am trying to store the text inside of the p elements which are inside a div in an ArrayList. The HTML is given below:
<div class="copy">
<p>First text</p>
<p>Second text</p>
<p>Third text</p>
</div>
I tried the following code but it concatenates all of the above and stores them as one instead of storing them separately:
Elements tips= doc.select("div.copy");
for(Element tip: tips) {
tipsArray.add(tip.text());
}
What am I doing wrong here? Thanks.
Just use:
Elements tips= doc.select("div.copy > p");
for(Element tip: tips) {
tipsArray.add(tip.html());
}
In jsoup Element.children() returns all children (descendants) of Element. But, I want the Element's first-level children (direct children).
Which method can I use?
Element.children() returns direct children only. Since you get them bound to a tree, they have children too.
If you need the direct children elements without the underlying tree structure then you need to create them as follows
public static void main(String... args) {
Document document = Jsoup
.parse("<div><ul><li>11</li><li>22</li></ul><p>ppp<span>sp</span</p></div>");
Element div = document.select("div").first();
Elements divChildren = div.children();
Elements detachedDivChildren = new Elements();
for (Element elem : divChildren) {
Element detachedChild = new Element(Tag.valueOf(elem.tagName()),
elem.baseUri(), elem.attributes().clone());
detachedDivChildren.add(detachedChild);
}
System.out.println(divChildren.size());
for (Element elem : divChildren) {
System.out.println(elem.tagName());
}
System.out.println("\ndivChildren content: \n" + divChildren);
System.out.println("\ndetachedDivChildren content: \n"
+ detachedDivChildren);
}
Output
2
ul
p
divChildren content:
<ul>
<li>11</li>
<li>22</li>
</ul>
<p>ppp<span>sp</span></p>
detachedDivChildren content:
<ul></ul>
<p></p>
This should give you the desired list of direct descendants of the parent node:
Elements firstLevelChildElements = doc.select("parent-tag > *");
OR You can also try to retrieve the parent element, get the first child node via child(int index) and then try to retrieve siblings of this child via siblingElements().
This will give you the list of first level children excluding the used child, however you'd have to add the child externally.
Elements firstLevelChildElements = doc.child(0).siblingElements();
You could always use the ELEMENT.child(index) with the index you can choose which child you want.
Here you can get the value of first-level children
Element addDetails = doc.select("div.container > div.main-content > div.clearfix > div.col_7.post-info > ul.no-bullet").first();
Elements divChildren = addDetails.children();
for (Element elem : divChildren) {
System.out.println(elem.text());
}