Parsing values from complex table using JSoup

Parsing values from complex table using JSoup - java

I have a table with the following html:
<TABLE class=data-table cellSpacing=0 cellPadding=0>
<TBODY>
<TR>
<TD colSpan=4><A id=accounting name=accounting></A>
<H3>Accounting</H3></TD></TR>
<TR>
<TH class=data-tablehd align=left>FORM NO.</TH>
<TH class=data-tablehd align=left>TITLE</TH>
<TH class=data-tablehd align=right>Microsoft</TH>
<TH class=data-tablehd align=right>Acrobat</TH></TR>
<TR>
<TD><A id=1008ft name=1008ft>SF 1008-FT</A></TD>
<TD>Work for Others Funding Transfer Between Projects for an Agreement</TD>
<TD align=right><A
href="https://someurl1"
target=top>MS Word</A></TD>
<TD align=right><A
href="https://someurl2"
target=top>PDF </A></TD></TR>
...
I need to parse the <TR> data getting something like
SF 1008-FT, Work for Others ... an Agreement, https://someurl1, https://someurl2
I have tried using the following code:
URL formURL = new URL("http://urlToParse");
Document doc = Jsoup.parse(formURL, 3000);
Element table = doc.select("TABLE[class = data-table]").first();
Iterator<Element> ite = table.select("td[colSpan=4]").iterator();
while(ite.next() != null) {
System.out.println(ite.next().text());
}
However this only returns the "back to Top" and some different headings located throughout the table.
Can someone help me write the correct JSoup code to parse the information I need?

I have not time to test, but you can use something like this:
Element table = doc.select("TABLE[class = data-table]").first();
Elements rows = table.select("tr");
for (Element td: rows.get(2).children()) {
System.out.println(td.text());
}
You get the children of the 3rd row of the table.

I found the solution with some small modification to a similar thread. The code that provides the solution is given below:
for (Element table : doc.select("table")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
formNumber = tds.get(0).text();
title = tds.get(1).text();
link1 = tds.get(2).select("a[href]").attr("href");
link2 = tds.get(3).select("a[href]").attr("href");
}
}

Related

Select href from HTML table using Jsoup

I have HTML table:
<table class="table_class" id="table_id"
<tbody>
<tr>...</tr>
<tr>
<td>...</td>
<td>
...
</td>
<td>...</td>
</tr>
<tr>...</tr>
</tbody>
And need to get all such hrefs from 1 column in table.
I tried to use
Elements links = table.select("a[href]");
System.out.println(links);
but it parse hrefs from a tags on complete page.

Maybe this will work:
String url = "...";
Document doc = Jsoup.connect(url).get();
Elements elements = doc.select("#table_id a[href]");

unable to retrieve the Table th tag value using webdriver with java

From the below html i want to check each row in the table header value and if matched need retrieve the td value
below is my html
<table class="span-5" id="summaryTable" title="Table showing Summary data">
<tbody>
<tr>
<th class="width-40" id="num">
(12) App no:
</th>
<td headers="num">
(11)
<strong>2796179</strong>
</td>
</tr>
<tr>
<th class="noLines alignLeft width35" id="EnglishTitle">
(54) English Title:
</th>
<td class="noLines alignLeft width65" headers="EnglishTitle">
FRAME BIT-SIZE ALLOCATION
</td>
</tr>
<tr>
</tbody>
</table>
i want to collect the each th tag value (i.e (12) App no (54) English Title)
my java code
WebElement summary = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody"));
List<WebElement>rows = summary.findElements(By.tagName("tr"));
for (int i=1;i<=rows.size();i++){
String dc = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody/tr["+i+"]/td/th/a")).getText();
if (dc.equalsIgnoreCase("(12) App no")){
appNo = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody/tr["+i+"]/td/strong")).getText();
}
}
but i'm getting no such element: Unable to locate element: {"method":"xpath","selector":"//*[#id='summaryTable']/tbody/tr[1]/td/th/a"}

Please use the below code for this
WebElement elem = driver.findElement(By.id("summaryTable"));
List<WebElement> lists = elem.findElements(By.tagName("th"));
for(WebElement el : lists){
WebElement element = el.findElement(By.tagName("a"));
String str = element.getAttribute("innerHTML");
System.out.println(str);
}

I think you are making it a bit complicated, can you try bit simpler version?
public String getRequiredDataFromTableFromRow(String header){
WebElement table = driver.findElement(By.id("summaryTable"));
List<WebElement> rows = table.findElements(By.tagName("tr"));
for (WebElement row:rows) {
if(row.getText().contains(header)){
return row.findElement(By.tagName("td")).getText();
}
}
return null;
}

Cells are also arrays within the row, so you need to specify the position to get the text. The th tag is not there within the td tag.
Try the following code:
WebElement summary = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody"));
List<WebElement>rows = summary.findElements(By.tagName("tr"));
for(int i = 1; i <= rows.size(); i++) {
String dc = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody/tr[" + i + "]/th[0]")).getText();
if(dc.equalsIgnoreCase("(12) App no")) {
appNo = driver.findElement(By.xpath("//*[#id='summaryTable']/tbody/tr[" + i + "]/td[0]")).getText();
}
}

Below is basically for getting you the text for each "th" element.
WebElement summary = driver.findElement(By.id("summaryTable"));
List<WebElement>rows = summary.findElements(By.tagName("th"));
for(WebElement row : rows){
row.getText();
}}
In the above code, I am getting the reference using the "id" and using same object reference in order to get the elements list for "th" tag.
In case you want to perform operation on the text been found can be done using the reference of the row element

java find table using jsoup and equivalent xpath

Here is the HTML code:
<table class="textfont" cellspacing="0" cellpadding="0" width="100%" align="center" border="0">
<tbody>
<tr>
<td class="chl" width="20%">Batch ID</td><td class="ctext">d32654464bdb424396f6a91f2af29ecf</td>
</tr>
<tr>
<td class="chl" width="20%">ALM Server</td>
<td class="ctext"></td>
</tr>
<tr>
<td class="chl" width="20%">ALM Domain/Project</td>
<td class="ctext">EBUSINESS/STERLING</td>
</tr>
<tr>
<td class="chl" width="20%">TestSet URL</td>
<td class="ctext">almtestset://localhost</td>
</tr>
<tr>
<td class="chl" width="20%">Tests Executed</td>
<td class="ctext"><b>6</b></td>
</tr>
<tr>
<td class="chl" width="20%">Start Time</td>
<td class="ctext">08/31/2017 12:20:46 PM</td>
</tr>
<tr>
<td class="chl" width="20%">Finish Time</td>
<td class="ctext">08/31/2017 02:31:46 PM</td>
</tr>
<tr>
<td class="chl" width="20%">Total Duration</td>
<td class="ctext"><b>2h 11m </b></td>
</tr>
<tr>
<td class="chl" width="20%">Test Parameters</td>
<td class="ctext"><b>{"browser":"chrome","browser-version":"56","language":"english","country":"US"}</b></td>
</tr>
<tr>
<td class="chl" width="20%">Passed</td>
<td class="ctext" style="color:#269900"><b>0</b></td>
</tr>
<tr>
<td class="chl" width="20%">Failed</td>
<td class="ctext" style="color:#990000"><b>6</b></td>
</tr>
<tr>
<td class="chl" width="20%">Not Completed</td>
<td class="ctext" style="color: ##ff8000;"><b>0</b></td>
</tr>
<tr>
<td class="chl" width="20%">Test Pass %</td>
<td class="ctext" style="color:#990000;font-size:14px"><b>0.0%</b></td>
</tr>
</tbody>
And here is the xpath to get the table:
//td[text() = 'TestSet URL']/ancestor::table[1]
How can I get this table using jSoup? I've tried:
tableElements = doc.select("td:contains('TestSet URL')");
to get the child element, but that doesn't work and returns null. I need to find the table and put all the children into a map. Any help would be greatly appreciated!

The following code will parse your table into a map, this code is subject to a few assumptions:
This xpath //td[text() = 'TestSet URL']/ancestor::table[1] will find any table which contains the text "TestSet URL" anywhere in its body, this seems a little bit brittle but assuming it is sufficient for you the JSoup code in getTable() is functionally equiavalent to that xpath
The code below assumes that every row contains two cells with the first one being the key and the second one being the value, since you want to parse the table content to a map this assumption seems valid
The code below throws exceptions if the above assumptions are not met i.e. if the given HTML does not contain a table definition with "TestSet URL" embedded in its body or if there are more than two cells in any row within that table.
If those assumptions are invalid then the internals of getTable and parseTable will change but the general approach will remain valid.
public void parseTable() {
Document doc = Jsoup.parse(html);
// declare a holder to contain the 'mapped rows', this is a map based on the assumption that every row represents a discreet key:value pair
Map<String, String> asMap = new HashMap<>();
Element table = getTable(doc);
// now walk though the rows creating a map for each one
Elements rows = table.select("tr");
for (int i = 0; i < rows.size(); i++) {
Element row = rows.get(i);
Elements cols = row.select("td");
// expecting this table to consist of key:value pairs where the first cell is the key and the second cell is the value
if (cols.size() == 2) {
asMap.put(cols.get(0).text(), cols.get(1).text());
} else {
throw new RuntimeException(String.format("Cannot parse the table row: %s to a key:value pair because it contains %s cells!", row.text(), cols.size()));
}
}
System.out.println(asMap);
}
private Element getTable(Document doc) {
Elements tables = doc.select("table");
for (int i = 0; i < tables.size(); i++) {
// this xpath //td[text() = 'TestSet URL']/ancestor::table[1] will find the first table which contains the
// text "TestSet URL" anywhere in its body
// this crude evaluation is the JSoup equivalent of that xpath
if (tables.get(i).text().contains("TestSet URL")) {
return tables.get(i);
}
}
throw new RuntimeException("Cannot find a table element which contains 'TestSet URL'!");
}
For the HTML posted in your question, the above code will output:
{Finish Time=08/31/2017 02:31:46 PM, Passed=0, Test Parameters={"browser":"chrome","browser-version":"56","language":"english","country":"US"}, TestSet URL=almtestset://localhost, Failed=6, Test Pass %=0.0%, Not Completed=0, Start Time=08/31/2017 12:20:46 PM, Total Duration=2h 11m, Tests Executed=6, ALM Domain/Project=EBUSINESS/STERLING, Batch ID=d32654464bdb424396f6a91f2af29ecf, ALM Server=}

You have to remove those quotation marks to get the row with the text; just
tableElements = doc.select("td:contains(TestSet URL)");
but note with the above you are only selecting td elements which contain the text "TestSet URL". To select the whole table use
Element table = doc.select("table.textfont").first();
which means select table with class=textfont and to avoid selecting multiple tables which can have the same class value you have to specify which to choose, therefore: first().
To get all the tr elements:
Elements tableRows = doc.select("table.textfont tr");
for(Element e: tableRows)
System.out.println(e);

Unable to grab an element in a dynamic table where we have only text of a td tag

I have a table whose data changes according to what is added and deleted. In the table there are multiple columns Name, Qty, type, Status. All i have is the Name Text with me , i need to find the status field of that row which has that Name.
Problem is the html tags of have same class names and i tried grabbing the parent and sibling everything failed. Please find the html structure of the table below:
<table>
<thead> </thead>
<tbody>
<tr>
<td class = "c1">
<a class = txtclass> text1 </a>
</td>
<td class = "c1"> Qty </td>
<td class = "c2"> type </td>
<td class = "c3">
<div id = "1" class = "status1"> /div>
</td>
</tr>
<tr>
<td class = "c1">
<a> text2 </a>
</td>
<td class = "c1"> Qty </td>
<td class = "c2"> type </td>
<td class = "c3">
<div id = "2" class = "status2"> /div>
</td>
</tr>
</tbody>
</table>
So all i have with me is text2 and i need to get the with the status of that row.
How do i proceed. I tried
List<WebElement> ele = driver.findElements(By.xpath("//*[#class = 'txtClass'][contains(text(),'text')]"));
for(WebElement el1:ele)
{
WebElement parent = el1.findElement(By.xpath(".."));
WebElement child1= parent.findElement(By.xpath("//td[4]/div"));
System.out.println(child1.getAttribute("class"));
}
this gives me the class name of the status of the first row in the table always.
Same i tried with
WebElement child = el1.findElement(By.xpath("//following-sibling::td[4]/div[1]"));
i got the same thing class name of the first row in the table. I figured since the class name of the are same for all child elements it will always grab the first row elements, and not the one from the row.
Please help i am stuck here for long, let me know if you need any other details.

You are trying using -
el1.findElements(By.xpath("//following-sibling::td[4]/div[1]"));
It is matching all the element present with format td[4]/div[1] in your page and retrieving first match.
You have to use following xpath to grab status present under div based on you text.
driver.findElement(By.xpath(".//tr/td[contains(.,'text1')]/following-sibling::td[3]/div")).getAttribute("class");
If your requirement to extract all status try following code-
List<WebElement> allElements = driver.findElements(By.xpath(".//tr/td[contains(.,'text2')]/following-sibling::td[3]/div"));
for(WebElement element:allElements)
{
String status = element.getAttribute("class");
System.out.println(status);
}

I think this approach suitable for you:
Get all div elements containt attribute status :
List<WebElement> listChildStatus = driver.findElements(By.xpath(".//tr[.//a[contains(.,'text')]]//div"));
Get specific div elements containt attribute status :
WebElement childStatus = driver.findElement(By.xpath(".//tr[.//a[contains(.,'{TEXT}')]]//div"));
{TEXT} = the text value that you have

You need to locate the element with the text, go one level up, and then get the sibling with the status
WebElement statusElement = driver.findElement(By.xpath(".//td[a[contains(text(),'text2')]]/following-sibling::td[3]/div"));
String status = statusElement.getAttribute("class"); // will be status2
If you don't want to relay on index you can use last() to find the last sibling
WebElement statusElement = driver.findElement(By.xpath(".//td[a[contains(text(),'text2')]]/following-sibling::td[last()]/div"));

How to set the checkbox in the first column based on the value in the second column of a table?

I'm automating a task using Java and Selenium.
I want to set a checkbox (which is in the first column of a table) based on whether the value in the second column matches my input value. For example, in the following code snippet, the value "Magnus" matches my input value so I want to set the checkbox associated with it.
<table class="cuesTableBg" width="100%" cellspacing="0" border="0" summary="Find List Table Result">
<tbody>
<tr class="cuesTableBg">
<tr class="cuesTableRowEven">
<tr class="cuesTableRowOdd">
<td align="center">
<input class="content-nogroove" type="checkbox" name="result[1].chked" value="true">
<input type="hidden" value="1c62dd7a-097a-d318-df13-75de31f54cb9" name="result[1].col[0].stringVal">
<input type="hidden" value="Magnus" name="result[1].col[1].stringVal">
</td>
<td align="left">
<a class="cuesTextLink" href="userEdit.do?key=1c62dd7a-097a-d318-df13-75de31f54cb9">Magnus</a>
</td>
<td align="left"></td>
<td align="left">Carlsen</td>
<td align="left"></td>
</tr>
<tr class="cuesTableRowEven">
</tbody>
</table>
But I'm unable to do it. In the above case, the following two lines serve the purpose (as my input value matches with that in the second row):
WebElement checkbox = driver.findElement(By.xpath("//input[#type = 'checkbox' and #name = 'result[1].chked']"));
checkbox.click();
But it can't be used as the required value might not always be in the second row.
I tried following code block but to no avail:
List<WebElement> rows = driver.findElements(By.xpath("//table[#summary = 'Find List Table Result']//tr"));
for (WebElement row : rows) {
WebElement userID = row.findElement(By.xpath(".//td[1]"));
if(userID.getText() == "Magnus") {
WebElement checkbox = row.findElement(By.xpath(".//input[#type = 'checkbox']"));
checkbox.click();
break;
}
}
For what it's worth, XPath of the text in the second column:
/html/body[#id='mainbody']/table/tbody/tr/td/div[#id='contentautoscroll']/form/table[2]/tbody/tr[3]/td[2]/a
I don't know about CSS Selectors. Will it help here?

If you know the input value already you just use below xpath to select respected checkbox
"//a[text(),'Magnus']/parent::td/preceding-sibling::td/input[#type='checkbox']"
Update:
"//a[text()='Magnus']/parent::td/preceding-sibling::td/input[#type='checkbox']"

While comparing two strings equals should be used instead of ==
Replace
if(userID.getText() == "Magnus")
with
String check1 = userID.getText();
if(check1.equals("Magnus")

Seems like it was a silly mistake on my part. The following code snippet, a fairly straightforward one, worked.
List<WebElement> rows = driver.findElements(By.xpath("//table[#class='cuesTableBg']//tr"));
for (WebElement row : rows) {
WebElement secondColumn = row.findElement(By.xpath(".//td[2]"));
if(secondColumn.getText().equals("Magnus")) {
WebElement checkbox = row.findElement(By.xpath(".//td[1]/input"));
checkbox.click();
break;
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing values from complex table using JSoup - java

I have not time to test, but you can use something like this: Element table = doc.select("TABLE[class = data-table]").first(); Elements rows = table.select("tr"); for (Element td: rows.get(2).children()) { System.out.println(td.text()); } You get the children of the 3rd row of the table.

Related

Select href from HTML table using Jsoup

unable to retrieve the Table th tag value using webdriver with java

java find table using jsoup and equivalent xpath

Unable to grab an element in a dynamic table where we have only text of a td tag

How to set the checkbox in the first column based on the value in the second column of a table?

Categories

Resources