Jsoup - how to find out elements size - java

I am confused with jsoup API. My code parses a table with 4 cells. But I found an occurence where three cells are merged into the single one and my code fails there because the child at position 3 does not exist.
String sMminutesLeft = row.child(3).text();
The element.child(x) returns a filtered list of child elements, e.g. only tags, not text nodes. But element.childNodesCount() will return a count of all elements including text nodes. I expected 4 but I receive 9 (lots of newlines are included).
I found element.getElementsByTag("TD") returning Elements object. This object acts like a container but it does not have any size() method.
How can I safely find out number of TDs under the current TR element? Implementing NodeVisitor seems like overkill to me.

I found a workaround but as I feel the API is incomplete, I have created a pull request that adds new method to get the number of filtered children that is complementary to child(int). Here it is: https://github.com/jhy/jsoup/pull/1291

Related

Jsoup eq selector returns no value

Trying to fetch data using Jsoup 1.10.3, seems like eq selector is not working correctly.
I tried the nth-child, but it seems like its not getting the second table (table:nth-child(2)).
Is my selector correct?
html > body > table:nth-child(2) > tbody > tr:nth-child(2) > td:nth-child(2)
in the example below, trying to extract the value 232323
Here is the try it sample
There are several issues that you may be struggling with. First, I don't think that you want to use the :nth-child(an+b) selector. Here is the explanation of that selector from the jsoup docs:
:nth-child(an+b) elements that have an+b-1 siblings before it in the document tree, for any positive integer or zero value of n, and has a parent element. For values of a and b greater than zero, this effectively divides the element's children into groups of a elements (the last group taking the remainder), and selecting the bth element of each group. For example, this allows the selectors to address every other row in a table, and could be used to alternate the color of paragraph text in a cycle of four. The a and b values must be integers (positive, negative, or zero). The index of the first child of an element is 1.
I guess you want to use the :table:nth-of-type(n) selector.
Second, you only select elements with your selector, but you want to get the visible content 232323, which is only one inner node of the element you select. So what is missing is the part where you get to the content. There are several ways of doing this. I again recommend that you read the docs. Especially the cookbook is very helpful for beginners. I guess you could use something like this:
String content = element.text();
Third, with CSS selector you really do to need to go through every hierarchy level of the DOM. Since tables always contain a tbody and tr and td elements, you may do something like this:
String content = document.select("table:nth-of-type(2) tr:nth-of-type(2) td:last-of-type").text();
Note, I do not have a java compiler at hand. Please use my code with care.

How do I determine list type (ul/li vs span) using WebDriver

I am working with a user-created table and list, where my program has to read in a list of entries for processing. I have the processors functioning, and I can navigate to the location in the table without any problems. The issue is that I am trying to allow for some flexibility in creating the list (inside the table) by allowing for the creator to input the list by either using un-ordered lists (/ul/li) and carriage-returns (/p).
Right now, I am determining whether the un-ordered list is used via driver.findElements(By.xpath("foo/ul/li")).size() being greater than 0. The issue is that this can take forever to "fail over." Is there a way that I am missing for making verifying element type (/ul/li vs /p vs /ol/li) faster?
I am using Java and Webdriver.
I guess you want to check if the concerned list is ordered or unordered by getting the size() and checking if it is greater than "0".
My suggestion is to get the parent tag of first "li" element which will then return "ul" or "li".
You can try the below code for that (Assuming list is present under some 'div' tag):
String tag = driver.findElement(By.xpath("//div//li[1]/..")).getTagName();//Returns the parent tag of the first element in the list
if(tag.contains("ul"))
System.out.println("List is unordered");
else if(tag.contains("li"))
System.out.println("List is ordered.");

How to select all children (with same tag. ex.table) except first and last with jsoup

I want to get all tags (with same tag. ex. table) in one div with id = content, except first and last. The number of tags (in this case tables) is dynamic.
You can get all of them (I assume you know how to do that, otherwise the question would be stated differently?), write to a list, let's call it tables, and then do tables.sublist(1, tables.size() - 1)
Here is the full solution using selectors
Document doc = Jsoup.parse(...) // parse from some source
Elements tables = doc.select("div#content table");
tables = tables.sublist(1, tables.size() - 1);
Excerpt from doc about selectors:
el, el, el: group multiple selectors, find unique elements that match any of the selectors; e.g. div.masthead, div.logo
:not(selector): find elements that do not match the selector
:last-child elements that are the last child of some other element.
:gt(n): find elements whose sibling index is greater than n; e.g. div p:gt(2)
I guess it's a good starting point.
More here

Jdoms annoying textnodes and addContent(index, Element) - schema solutions?

i have some already generated xmls and the application causing problems now needs to add elements to it which need to be at a specific position to be valid with to the applied schemata...
now there are two problems the first one is that i have to hardcode the positions which is not that nice but "ok".
But the much bigger one is jdom... I printed the content list and it looks like:
element1
text
element2
element4
text
element5
while the textnodes are just whitespaces and every element i add makes it even more unpredictable how many textnodes there are (because sometimes there are added some sometimes not) which are just counted as it were elements but i want to ignore them because when i add element3 at index 2 its not between element2 and element4 it comes after this annoying textnode.
Any suggestions? The best solution imho would be something that automatically puts it where it has to be according to the schema but i think thats not possible?
Thanks for advice :)
The JDOM Model of the XML is very literal... it has to be. On the other hand, JDOM offers ways to filter and process the XML in a way that should make your task easier.
In your case, you want to add Element content to the document, and all the text content is whitespace..... so, just ignore all the text content, and worry about the Element content only.
For example, if you want to insert a new element nemt before the 3rd Element, you can:
rootemt.getChildren().add(3, new Element("nemt"));
The elements are now sorted out.... what about the text...
A really simple solution is to just pretty-print the output:
XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
xout.output(System.out, mydoc);
That way all the whitespace will be reformatted to make the XML 'pretty'.
EDIT - and no, there is no way with JDOM to automatically insert the element in the right place according to the schema....
Rolf

In Jsoup, is it possible get the Elements from a list of Elements without runs through it?

I'm new to Jsoup, but this appears to be a great tool. I'm trying to extract the robots metatag.
I have the following code:
Document doc = Jsoup.parse(htmlContent);
Elements metatags = doc.select("meta");
Element robots = metatags.attr("name", "robots"); // is getting the first element of the list
The last line is wrong.
I want to know if is necessary to run the list of elements to find the element that matches the attribute or there a way that extracts the element that matches the attribute from the Elements list.
Edit 1: I solved this changing to doc.select("meta[name=robots]").
Edit 2: In another words: I want to know how to get all elements in a Elements list that matches some atribute requisite.
Edit 3: I was precipitated doing this question because I had not seen the main documentation yet. Sorry.
It's possible to set the attribute and value you want to retrieve in the select() method to do a better filtering.
Change the select to: doc.select("meta[name=robots]"); and it will get all elements that has the meta tag and it have the name attribute equals robots.
Have you read the JSoup documentation? Here it is from the method you are using:
attr
public Elements attr(String attributeKey,
String attributeValue)
Set an attribute on all matched elements.
Parameters:
attributeKey - attribute key
attributeValue - attribute value
Returns:
this
It returns this. Which means it will return an Elements object. This can't be assigned to an Element object.
I also think you want to use Document.getElementsByTag(String), instead of select.

Categories