java find table using jsoup and equivalent xpath

java find table using jsoup and equivalent xpath - java

Here is the HTML code:
<table class="textfont" cellspacing="0" cellpadding="0" width="100%" align="center" border="0">
<tbody>
<tr>
<td class="chl" width="20%">Batch ID</td><td class="ctext">d32654464bdb424396f6a91f2af29ecf</td>
</tr>
<tr>
<td class="chl" width="20%">ALM Server</td>
<td class="ctext"></td>
</tr>
<tr>
<td class="chl" width="20%">ALM Domain/Project</td>
<td class="ctext">EBUSINESS/STERLING</td>
</tr>
<tr>
<td class="chl" width="20%">TestSet URL</td>
<td class="ctext">almtestset://localhost</td>
</tr>
<tr>
<td class="chl" width="20%">Tests Executed</td>
<td class="ctext"><b>6</b></td>
</tr>
<tr>
<td class="chl" width="20%">Start Time</td>
<td class="ctext">08/31/2017 12:20:46 PM</td>
</tr>
<tr>
<td class="chl" width="20%">Finish Time</td>
<td class="ctext">08/31/2017 02:31:46 PM</td>
</tr>
<tr>
<td class="chl" width="20%">Total Duration</td>
<td class="ctext"><b>2h 11m </b></td>
</tr>
<tr>
<td class="chl" width="20%">Test Parameters</td>
<td class="ctext"><b>{"browser":"chrome","browser-version":"56","language":"english","country":"US"}</b></td>
</tr>
<tr>
<td class="chl" width="20%">Passed</td>
<td class="ctext" style="color:#269900"><b>0</b></td>
</tr>
<tr>
<td class="chl" width="20%">Failed</td>
<td class="ctext" style="color:#990000"><b>6</b></td>
</tr>
<tr>
<td class="chl" width="20%">Not Completed</td>
<td class="ctext" style="color: ##ff8000;"><b>0</b></td>
</tr>
<tr>
<td class="chl" width="20%">Test Pass %</td>
<td class="ctext" style="color:#990000;font-size:14px"><b>0.0%</b></td>
</tr>
</tbody>
And here is the xpath to get the table:
//td[text() = 'TestSet URL']/ancestor::table[1]
How can I get this table using jSoup? I've tried:
tableElements = doc.select("td:contains('TestSet URL')");
to get the child element, but that doesn't work and returns null. I need to find the table and put all the children into a map. Any help would be greatly appreciated!

The following code will parse your table into a map, this code is subject to a few assumptions:
This xpath //td[text() = 'TestSet URL']/ancestor::table[1] will find any table which contains the text "TestSet URL" anywhere in its body, this seems a little bit brittle but assuming it is sufficient for you the JSoup code in getTable() is functionally equiavalent to that xpath
The code below assumes that every row contains two cells with the first one being the key and the second one being the value, since you want to parse the table content to a map this assumption seems valid
The code below throws exceptions if the above assumptions are not met i.e. if the given HTML does not contain a table definition with "TestSet URL" embedded in its body or if there are more than two cells in any row within that table.
If those assumptions are invalid then the internals of getTable and parseTable will change but the general approach will remain valid.
public void parseTable() {
Document doc = Jsoup.parse(html);
// declare a holder to contain the 'mapped rows', this is a map based on the assumption that every row represents a discreet key:value pair
Map<String, String> asMap = new HashMap<>();
Element table = getTable(doc);
// now walk though the rows creating a map for each one
Elements rows = table.select("tr");
for (int i = 0; i < rows.size(); i++) {
Element row = rows.get(i);
Elements cols = row.select("td");
// expecting this table to consist of key:value pairs where the first cell is the key and the second cell is the value
if (cols.size() == 2) {
asMap.put(cols.get(0).text(), cols.get(1).text());
} else {
throw new RuntimeException(String.format("Cannot parse the table row: %s to a key:value pair because it contains %s cells!", row.text(), cols.size()));
}
}
System.out.println(asMap);
}
private Element getTable(Document doc) {
Elements tables = doc.select("table");
for (int i = 0; i < tables.size(); i++) {
// this xpath //td[text() = 'TestSet URL']/ancestor::table[1] will find the first table which contains the
// text "TestSet URL" anywhere in its body
// this crude evaluation is the JSoup equivalent of that xpath
if (tables.get(i).text().contains("TestSet URL")) {
return tables.get(i);
}
}
throw new RuntimeException("Cannot find a table element which contains 'TestSet URL'!");
}
For the HTML posted in your question, the above code will output:
{Finish Time=08/31/2017 02:31:46 PM, Passed=0, Test Parameters={"browser":"chrome","browser-version":"56","language":"english","country":"US"}, TestSet URL=almtestset://localhost, Failed=6, Test Pass %=0.0%, Not Completed=0, Start Time=08/31/2017 12:20:46 PM, Total Duration=2h 11m, Tests Executed=6, ALM Domain/Project=EBUSINESS/STERLING, Batch ID=d32654464bdb424396f6a91f2af29ecf, ALM Server=}

You have to remove those quotation marks to get the row with the text; just
tableElements = doc.select("td:contains(TestSet URL)");
but note with the above you are only selecting td elements which contain the text "TestSet URL". To select the whole table use
Element table = doc.select("table.textfont").first();
which means select table with class=textfont and to avoid selecting multiple tables which can have the same class value you have to specify which to choose, therefore: first().
To get all the tr elements:
Elements tableRows = doc.select("table.textfont tr");
for(Element e: tableRows)
System.out.println(e);

Related

Parsing Table with Jsoup for Android App

I am attempting to parse through a table on a website for a give table row in which the first column matches a certain string of characters. Below is the HTML for part of the table (it's very larger)
<table class="table display datatable" id="datatable1">
<thead>
<tr>
<th class="va-m">Miner</th>
<th class="va-m">Shares</th>
<th class="va-m">%</th>
<th class="va-m">Best DL</th>
</tr>
</thead>
<tfoot>
<tr>
<th class="va-m">Miner</th>
<th class="va-m">Shares</th>
<th class="va-m">%</th>
<th class="va-m">Best DL</th>
</tr>
</tfoot>
<tbody>
<tr>
<td>3R8RDBxiux3g1pFCCsQnm2vwD34axsVRTrEWzyX8tngJaRnNWkbnuFEewzuBAKhQrb3LxEQHtuBg1zW4tybt83SS</td>
<td>44279</td>
<td>27.37 %</td>
<td>1154</td>
</tr>
<tr>
<td>5gwVxC9cXguHHjD9wtTpHfsJPaZx4fPcvWD5jGWF1dcuHnAMyXxteaHrEtXviZkvWN3FAnevbVLErABSsP6mS7PR</td>
<td>36369</td>
<td>22.48 %</td>
<td>2725</td>
</tr>
<tr>
<td>2qZXPmop82UiA7LQEQqdoUzjFbcwCSpqf8U1f3656XXSsHnGvGXYTNoP11s2asiVSyVS8LPFqxmpdCeSNxcpFMnF</td>
<td>28596</td>
<td>17.68 %</td>
<td>967</td>
</tr>
<tr>
<td>21mbNSDo7g9BAyjsZGxnNfJUrEtBUVVNQZhR4tkVwdEHPaMNsa2u2JHQPAAe5riGfPA9Khb1Pq3bQGhqmrLEGNqN</td>
<td>6104</td>
<td>3.77 %</td>
<td>4787</td>
</tr>
<tr>
<td>4HAakKK7dSq18Djg7m6cLSyHb5aUU6ngvBQimo8pYyF5F64qX3gE4T8q8kfWHTZ79FvXybSG3JhUfSZDDv2sRwqY</td>
<td>5895</td>
<td>3.64 %</td>
<td>6020</td>
</tr>
<tr>
<td>2r2izPEC5o7ZDnUsdDA97q8wKCeZRRg9n243Rd9vkMQqRCtc6ZRUTruQUyZGduoHy8pTYPuEq9ACXPKfXt8fqKxS</td>
<td>5605</td>
<td>3.46 %</td>
<td>10958</td>
</tr>
</tbody>
</table>
I am trying to step through the table and search for a specific row but I am receiving an IndexOutOfBoundsException.
Would there be a better way to code the statement below?
for (Element table : doc.select("table")){
for(Element row : table.select("tr")){
Elements tds = row.select("td");
if(tds.get(0).text().equals("4HjSN79KUMz7AQC3GBvGkgPa5Qrio9HWTh7hg9JY48fkrYeVZJVmzB9YCB6GZSpuXB7V7DjJVuke3ZaCm5k7sRLE")){
myHistoricShares =tds.get(0).text();
}
}
}

As I said in comments, your table.select("tr") selects rows not only inside <tbody>, but inside the header and the footer too. For those rows row.select("td") returns an empty list, and hence tds.get(0) throws the IndexOutOfBoundsException.
You could simplify your loop by selecting only the rows in <tbody>:
for (Element row: doc.select("table#datatable1>tbody>tr")) {
if (row.children().size() > 0 && "some_long_string".equals(row.child(0).text())) {
doSomething();
}
}
The selector "table#datatable1>tbody>tr" selects the table with id="datatable1", then its exact tbody child and then all its exact tr children. So you only need to iterate through them once.

How to get number of columns <td> or <th> in a row <tr>?

I want to get the number of columns in a given row, when I use below snippet
int size = element.findElement(By.xpath("/tr[" + Row + "]")).findElements(By.tagName("td")).size();
I get an exception stating NoSuchElementException.
Below is the HTML block
<table class="DynamicOrderTable" id="customerOrderHeader">
<tbody>
<tr style="background-color: rgb(246, 246, 246);">
<th>Order Number</th>
<th>Version</th>
<th>Customer Name</th>
<th>Order Status</th>
<th>Order Sub-status</th>
</tr>
<tr class="Temp">
<td class="close" id="orderNumber">
<div class="clickableText">1234</div>
</td>
<td class="close">1</td>
<td class="close" id="customerName">ABC101</td>
<td title="OrderResponse Sequence Number:14" class="close" id="orderStatus"><div class="clickableText">Complete</div></td>
<td class="close">Closed</td>
</tr>
</tbody>
</table>
How to get the number of columns in a row when we have thead and tbody tags?

Your code to get column size is correct, just change /tr to .//tr in your xpath as :-
WebElement element = driver.findElement(By.id("customerOrderHeader"));
int size = element.findElement(By.xpath(".//tr[" + Row + "]")).findElements(By.tagName("td")).size();
Or
int size = element.findElements(By.tagName("tr")).get(Row).findElements(By.tagName("td")).size();

If I understood your question correctly, code below should help you to get no. of columns.
Using XPath
int size = driver.findElements(By.xpath("//*[#id="customerOrderHeader"]/tbody/tr[1]/th")).size();
Using CssSelector
int size = driver.findElements(By.cssSelector("table#customerOrderHeader>tbody>tr:first-child>th")).size();

Try the below Code:
int count =driver.findElements(By.xpath(".//tr[#class='Temp']//td")).size();
System.out.println("count is "+count);//output is count is 5

use cells property with row index
document.getElementById("customerOrderHeader").rows[0].cells.length;

How to use xpath to get href value

<div id="AdvancedSearchResultsContainter">
<table id="SearchResults" class="tablesorter">
<thead>
<tr>
<th scope="col" class="header">School name</th>
<th scope="col" class="header">School type</th>
<th scope="col" class="header">Sector</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>ABC Public School</td>
<td class="nowrap">Primary</td>
<td class="nowrap">Government</td>
</tr>
<tr class="even">
<td>XYZ High School</td>
<td class="nowrap">Secondary</td>
<td class="nowrap">Government</td>
</tr>
<tr class="odd">
<td>PQR Park Public School</td>
<td class="nowrap">Primary</td>
<td class="nowrap">Government</td>
</tr>
<tr class="even">
<td>JKL Public School</td>
<td class="nowrap">Primary</td>
<td class="nowrap">Government</td>
</tr>
</tbody>
</table>
</div>
I am using selenum and xpath .
I want to get the numeric value of href .Out of this href
i want to get 82648. like to put it in loop and get all numeric in href.
can some one please help.

You can use following css selector to get the <a> element:
By.cssSelector("#SearchResults tr a");
Then get all the link elements by using driver.findElements(By.cssSelector("#SearchResults tr a")) and the use getAttribute("href") to get the urls
Something like:
List<WebElement> elements = driver.findElements(By.cssSelector("#SearchResults tr a"));
Get the urls and then do whatever you want. The java.lang.String class provides a lot of methods to work on string. By the help of these methods, we can perform operations on string such as trimming, concatenating, converting, comparing, replacing strings etc. As an example:
for(WebElement e : elements) {
String url = e.getAttribute("href");
System.out.println(url.substring(url.length()-5));
}
There are other methods to get the substring as well.
Also you can write a method which will return a String and then you can assert if you intend to do so.

Detect innermost web element in (nested) in selenium

I am looking for getting the inner most web element in a page, when there are similar nested Webelements in a page.
Consider the example below:
<body>
<table id="level1">
<tr>
<td>
<table id="level2">
<tr>
<td>
<table id="level3">
<tr>
<td>
<p>Test</p>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
<table id="level1_table2">
<tr>
<td>
<table id="level2_table2">
<tr>
<td></td>
</tr>
</table>
</td>
</tr>
</table>
</body>
So when I do a search on the page by Driver.findElements by tag "table" and which have some text - "Test",
I will get 5 WebElements in total, namely - "level1", "level3" , "level1_table2" , "level2_table2"
What I want to achieve is to have a list of innermost(nested) elements which satisfy my search criteria .
So the List I should get should only have 2 WebElements namely - "level3" and "level2_table2".
I am looking something probably on the lines of recursion. Can somebody help me out.

You don't need recursion - everything you need is the proper XPath expression:
driver.findElements(By.xpath("table[not(.//table)]"))

I would use this strategy:
Search WebElements containing text Test
For each WebElement search for the first parent which match tag name is table
Here is in Java:
List<WebElement> elementsWithTest = driver.findElements(By.xpath("//*[contains(text(),'Test')]"));
List<WebElement> result = new ArrayList<>();
for(WebElement element : elementsWithTest) {
WebElement parent = element.findElement(By.xpath(".."));
while (! "table".equals(parent.getTagName())) {
parent = parent.findElement(By.xpath(".."));
}
if ("table".equals(parent.getTagName())) {
result.add(parent);
}
}
System.out.println(result);
Hope that helps.

get particular element via jsoup

i want to select td.team in such way for 1st textview1 i want td.team (1st td.team ) and for 2nd textview i want (2nd td.team ) ... i am using list apdater
Elements info = dpc.select("td.team "); but it giving me both team it 1st and 2nd so how i can do it .. pl tell me what Elements shld u use to get info
<tr class="odd">
<td class="date">10</td>
<td class="team">one</td>
<td class="team">two</td>
<td class="type">8M</td>
</tr>
<tr class="even">
<td class="date">01</td>
<td class="team">Nice</td>
<td class="team">Monaco</td>
<td class="type">€ 4.1M</td>
</tr>
<tr class="odd">
<td class="date">07</td>
<td class="team">thre</td>
<td class="team">fou</td>
<td class="type"> 600K</td>
</tr>
<tr class="even">
<td class="date">99</td>
<td class="team"><a href="sad" title="Marsala">M/a></td>
<td class="team">a</td>
<td class="type">50K</td>
</tr>

i really don't understand your question. Do u want to get the first td from each row and second td from each row. If thats is the case you need to form an array.
Elements info = dpc.select("tr.odd,tr.even").select("td.team");
int i = 0;
String linkText = "";
String cse_id = null ;
ArrayList<String> s =new ArrayList<String>();
for(Element el : info ){
linkText = el.attr("href");//or el.attr("title")
s.add(linkText);
System.out.println(linkText);/or print it
}
}
There might be some errors in the code i didn't test it .

'Select' method returns a 'Elements' object. This class has a method called 'get(int index)' which returns the i-th element in the selection, starting from zero.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java find table using jsoup and equivalent xpath - java

Related

Parsing Table with Jsoup for Android App

How to get number of columns <td> or <th> in a row <tr>?

How to use xpath to get href value

Detect innermost web element in (nested) in selenium

get particular element via jsoup

Categories

Resources