Element id in the loop (JSOUP) - java

Here is my code:
Element current = doc.select("tr[class=row]").get(5);
for (Element td : current.children()) {
System.out.println(td.text());
}
How can I get an Element id in the loop?
Thanks!

In HTML id is a normal attribute, so you can simply call td.attr("id"):
Element current = doc.select("tr.row").get(5);
for (Element td : current.children()) {
System.out.println(td.attr("id"));
}
Note that there is also a selector for classes: tr.row.
JSoup supports many of the CSS selectors, so this could be rewritten with a single selector:
Elements elements = doc.select("tr.row:nth-of-type(6) > td");
for (Element element : elements) {
System.out.println(element.id());
}

Related

How to read all the heading tag on page and select one hading tag with a specific heading?

I am trying to read all the heading tag on a page and need to click only one heading tag named "dropdown". The sample structure of HTML is as follows
<div> <ul> <li>
<a href="submit_button_clicked.php">
<h2>Submit Button Clicked</h2>
<figure>
</a>
</li>
<li>
<a href="dropdown.php">
<h2>Dropdown</h2>
<figure>
What i did is to create a custom xpath and store it in List,then iterate through list using for loop but i am unable to /read/write the value of tag on console.
List l = ff.findElements(By.xpath("//div/ul/li/a/h2"));
To retrieve the text value of an element use:
element.getText();
In your case with your list it would look something like this:
for(WebElement element : l) {
System.out.println(element.getText());
}
Since you want to click on an element, it would be better to use an xpath such as the following:
ff.findElements(By.xpath("h2[text()='Dropdown']")).click();
To find and click the specific element you want. The above xpath selector looks for a h2 element with the exact text 'Dropdown' and then clicks on it.
Reading all <h2> tags can look something like:
List<WebElement> elements = ff.findElements(By.xpath("//h2"));
for(WebElement element : elements) {
System.out.println(element.getText()); // just to show that it prints text
}
Note that I defined list as List<WebElement> which is to avoid usage of raw types, and changed xpath to match any <h2>.
But when you need to click, usually you are required to click on parent <a> element, not on <h2> itself, i.e. the following should click on a correct link
ff.findElement(By.xpath("//a[#href='dropdown.php']")).click();
But if you want to find a link from header, in the above loop:
List<WebElement> elements = ff.findElements(By.xpath("//h2"));
for(WebElement element : elements) {
if("Download".equals(element.getText()) {
// get the parent <a> element and click on it
element.findElement(By.xpath("..")).click();
}
}
Hi please do it like below
WebDriver driver = new FirefoxDriver();
driver.get("http://www.seleniumhq.org");
driver.manage().timeouts().implicitlyWait(15, TimeUnit.SECONDS);
// Hi please do it like below ,take all H2 tag inside the list
List<WebElement> myH2Tags = driver.findElements(By.tagName("h2")); // you can put any tag name as per your requirement
for(int i=0;i<myH2Tags.size();i++){
System.out.println("Value of My H2 Tags are : " + myH2Tags.get(i).getText());
if(myH2Tags.get(i).getText().equals("Selenium News")){ // you can replace this with drop down value
myH2Tags.get(i).click();
}
// to avoid stale element exception you have to re identify the elements
myH2Tags = driver.findElements(By.tagName("h2"));
}

Get attribute values from all elements

Code:
Document doc = Jsoup.connect("things.com").get();
Elements jpgs = doc.select("img[src$=.jpg]");
String links = jpgs.attr("src");
System.out.print("all: " + jpgs);
System.out.print("src: " + links);
Output:
all:
<img alt="Apple" src="apple.jpg">
<img alt="Cat" src="cat.jpg">
<img alt="Boat" src="boat.jpg">
src: apple.jpg
Jsoup gave the attribute value for first element. How can I get the others (cat.jpg and boat.jpg)?
Thank you.
You loop through links and get it from each one via Element#attr, since Elements#attr (note the s) says:
Get an attribute value from the first matched element that has the attribute.
(My emphasis.)
So for instance:
for (Element e : jpgs) {
// use e.attr("src") here
}
Using Java 8's new Stream stuff, you can probably get a List<String> of them if you like:
List<String> links = jpgs.stream<Element>()
.map(element -> element.attr("src"))
.collect(Collectors.toList());
...but my Java 8 streams-fu is very weak, so that may not be quite right. Yeah, that isn't right. But that's the general idea.
The boring old-fashioned way is:
List<String> links = new ArrayList<String>(links.size());
for (Element e : jpgs) {
srcs.add(e.attr("src"));
}
Elements#attr will only return the first match.
Elements#attr Source Code
public String attr(String attributeKey) {
for (Element element : this) {
if (element.hasAttr(attributeKey))
return element.attr(attributeKey);
}
return "";
}
Solution
To obtain the result you want, you should loop over your Elements
for (Element e : jpgs) {
System.out.println(e.attr("src"));
}

Java Code Optimization(jsoup)

Is there an efficient way to optimize this code, as most part of it look like identical, I just started learning jsoup and dont know how really can do that ://
Document doc = Jsoup.connect("http://www.blocket.se/hela_sverige/bilar?ca=11&cg=1020&w=3&md=th").get();
Elements partOne = doc.select("a[title=Flera bilder]");
for (Element element : partOne) {
String myElementOne = element.attr("abs:href");
System.out.println(myElementOne);
}
Elements partTwo = doc.select("a[title=\"\"]");
for (Element element : partTwo) {
String myElementTwo = element.attr("abs:href");
System.out.println(myElementTwo);
}
Elements partThree = doc.select("a[title=Bild]");
for (Element element : partThree) {
String myElementThree = element.attr("abs:href");
System.out.println(myElementThree);
}
The partOne, partTwo and partThree blocks are basically identical; just replace all of the parameter differences with variables and extract to a method:
void someMethodName(Document doc, String selector) {
Elements partOne = doc.select(selector);
for (Element element : partOne) {
String myElementOne = element.attr("abs:href");
System.out.println(myElementOne);
}
}
Example invocation:
someMethodName(doc, "a[title=Flera bilder]");
Alternatively, if you have access to Guava:
Iterable<Element> it = Iterables.concat(
doc.select("a[title=Flera bilder]"),
doc.select("a[title=\"\"]"),
doc.select("a[title=Bild]"));
for (Element element : it) {
String myElement = element.attr("abs:href");
System.out.println(myElement);
}
Andy's solution is of course doing the job. However, since you asked specifically for ways optimizing the JSoup calls, I would suggest to learn more about CSS selectors and regular expressions. For example this will do fine in your case:
Elements allParts = doc.select("a[title~=^Flera bilder$|^$|^Bild$]");
for (Element element : allParts) {
String elStr = element.attr("abs:href");
System.out.println(elStr);
}
Here, I use the ~= operator for attribute texts. It allows me to use a common regular expression to combine all three of your select statements into one.
An alternative way of doing this would be to use the , operator for adding all selectors into one:
Elements allParts2 = doc.select("a[title=Flera bilder],a[title=\"\"],a[title=Bild]");

How to get first-level children of an element in jsoup

In jsoup Element.children() returns all children (descendants) of Element. But, I want the Element's first-level children (direct children).
Which method can I use?
Element.children() returns direct children only. Since you get them bound to a tree, they have children too.
If you need the direct children elements without the underlying tree structure then you need to create them as follows
public static void main(String... args) {
Document document = Jsoup
.parse("<div><ul><li>11</li><li>22</li></ul><p>ppp<span>sp</span</p></div>");
Element div = document.select("div").first();
Elements divChildren = div.children();
Elements detachedDivChildren = new Elements();
for (Element elem : divChildren) {
Element detachedChild = new Element(Tag.valueOf(elem.tagName()),
elem.baseUri(), elem.attributes().clone());
detachedDivChildren.add(detachedChild);
}
System.out.println(divChildren.size());
for (Element elem : divChildren) {
System.out.println(elem.tagName());
}
System.out.println("\ndivChildren content: \n" + divChildren);
System.out.println("\ndetachedDivChildren content: \n"
+ detachedDivChildren);
}
Output
2
ul
p
divChildren content:
<ul>
<li>11</li>
<li>22</li>
</ul>
<p>ppp<span>sp</span></p>
detachedDivChildren content:
<ul></ul>
<p></p>
This should give you the desired list of direct descendants of the parent node:
Elements firstLevelChildElements = doc.select("parent-tag > *");
OR You can also try to retrieve the parent element, get the first child node via child(int index) and then try to retrieve siblings of this child via siblingElements().
This will give you the list of first level children excluding the used child, however you'd have to add the child externally.
Elements firstLevelChildElements = doc.child(0).siblingElements();
You could always use the ELEMENT.child(index) with the index you can choose which child you want.
Here you can get the value of first-level children
Element addDetails = doc.select("div.container > div.main-content > div.clearfix > div.col_7.post-info > ul.no-bullet").first();
Elements divChildren = addDetails.children();
for (Element elem : divChildren) {
System.out.println(elem.text());
}

Jsoup: Optimal way of checking whether a <div> has an ID

I am able to iterate through all div elements in a document, using getElementsByTag("div").
Now I want to build a list of only div elements that have the attribute "id" (i.e. div elements with attribute "class" shouldn't be in the list).
Intuitively, I was thinking of checking something like this:
if (divElement.attr("id") != "")
add_to_list(divElement);
Is my approach correct at all?
Is there a more optimal way of testing for having the "id" attribute? (the above uses string comparison for every element in the DOM document)
You can do it like this:
Elements divsWithId = doc.select("div[id]");
for(Element element : divsWithId){
// do something
}
Reference:
JSoup > Selector Syntax
Try this:
var all_divs = document.getElementsByTagName("div");
var divs_with_id = [];
for (var i = 0; i < all_divs.length; i++)
if (all_divs[i].hasAttribute("id"))
divs_with_id.push(all_divs[i]);

Categories