Parsing Table with Jsoup for Android App

Parsing Table with Jsoup for Android App - java

I am attempting to parse through a table on a website for a give table row in which the first column matches a certain string of characters. Below is the HTML for part of the table (it's very larger)
<table class="table display datatable" id="datatable1">
<thead>
<tr>
<th class="va-m">Miner</th>
<th class="va-m">Shares</th>
<th class="va-m">%</th>
<th class="va-m">Best DL</th>
</tr>
</thead>
<tfoot>
<tr>
<th class="va-m">Miner</th>
<th class="va-m">Shares</th>
<th class="va-m">%</th>
<th class="va-m">Best DL</th>
</tr>
</tfoot>
<tbody>
<tr>
<td>3R8RDBxiux3g1pFCCsQnm2vwD34axsVRTrEWzyX8tngJaRnNWkbnuFEewzuBAKhQrb3LxEQHtuBg1zW4tybt83SS</td>
<td>44279</td>
<td>27.37 %</td>
<td>1154</td>
</tr>
<tr>
<td>5gwVxC9cXguHHjD9wtTpHfsJPaZx4fPcvWD5jGWF1dcuHnAMyXxteaHrEtXviZkvWN3FAnevbVLErABSsP6mS7PR</td>
<td>36369</td>
<td>22.48 %</td>
<td>2725</td>
</tr>
<tr>
<td>2qZXPmop82UiA7LQEQqdoUzjFbcwCSpqf8U1f3656XXSsHnGvGXYTNoP11s2asiVSyVS8LPFqxmpdCeSNxcpFMnF</td>
<td>28596</td>
<td>17.68 %</td>
<td>967</td>
</tr>
<tr>
<td>21mbNSDo7g9BAyjsZGxnNfJUrEtBUVVNQZhR4tkVwdEHPaMNsa2u2JHQPAAe5riGfPA9Khb1Pq3bQGhqmrLEGNqN</td>
<td>6104</td>
<td>3.77 %</td>
<td>4787</td>
</tr>
<tr>
<td>4HAakKK7dSq18Djg7m6cLSyHb5aUU6ngvBQimo8pYyF5F64qX3gE4T8q8kfWHTZ79FvXybSG3JhUfSZDDv2sRwqY</td>
<td>5895</td>
<td>3.64 %</td>
<td>6020</td>
</tr>
<tr>
<td>2r2izPEC5o7ZDnUsdDA97q8wKCeZRRg9n243Rd9vkMQqRCtc6ZRUTruQUyZGduoHy8pTYPuEq9ACXPKfXt8fqKxS</td>
<td>5605</td>
<td>3.46 %</td>
<td>10958</td>
</tr>
</tbody>
</table>
I am trying to step through the table and search for a specific row but I am receiving an IndexOutOfBoundsException.
Would there be a better way to code the statement below?
for (Element table : doc.select("table")){
for(Element row : table.select("tr")){
Elements tds = row.select("td");
if(tds.get(0).text().equals("4HjSN79KUMz7AQC3GBvGkgPa5Qrio9HWTh7hg9JY48fkrYeVZJVmzB9YCB6GZSpuXB7V7DjJVuke3ZaCm5k7sRLE")){
myHistoricShares =tds.get(0).text();
}
}
}

As I said in comments, your table.select("tr") selects rows not only inside <tbody>, but inside the header and the footer too. For those rows row.select("td") returns an empty list, and hence tds.get(0) throws the IndexOutOfBoundsException.
You could simplify your loop by selecting only the rows in <tbody>:
for (Element row: doc.select("table#datatable1>tbody>tr")) {
if (row.children().size() > 0 && "some_long_string".equals(row.child(0).text())) {
doSomething();
}
}
The selector "table#datatable1>tbody>tr" selects the table with id="datatable1", then its exact tbody child and then all its exact tr children. So you only need to iterate through them once.

Related

how to extract specified values from a table with selenium based on a string condition

I have a table which has a column which contains:
a list of invoices
a column which contains a lots of charge types for every invoice displayed
What I want to do is to make a function which receives a String parameter,for example the invoice number and return all the charge types for invoice number inserted
Here is the code for the table
Every time a new invoice is displayed on the table,the first line of the table contains and a value
That value represents the number of the charge types displayed on every invoice
For example the charge types are :Management fee,Payments,Funds Transmission Cost,Acquiring Authorisation Fee,Service etc.
<form method="post" action="/accounting/billing/showInvoiceTransactionsCountTotal.html?
jlbz=lfISHfhqWHPj5fSzCwFKoP8c5ukwXecQt0fr4iL6ak" target="detail">
<table>
<tbody>
<tr>
<tr class="odd">
<td rowspan="8">
<a href="/accounting/billing/showInvoice.html?invoiceNumber=BA7123399&jlbz=lfISHfhqWHPj5fSzCwFKoP8c5ukwXecQt0fr4iL6ak">
BA7123399
<input type="hidden" value="BA7123399" name="invoiceChecked"/>
</a>
</td>
<td>Management fee (captured transactions)</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>2</td>
</tr>
<tr class="odd">
<td>Payments</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>2</td>
</tr>
<tr class="odd">
<td>Funds Transmission Cost (FTC)</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>1</td>
</tr>
<tr class="odd">
<td>Acquiring Authorisation Fees</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>2</td>
</tr>
<tr class="odd">
<td>Service</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>2</td>
</tr>
<tr class="odd">
<td>Refunds</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>1</td>
</tr>
<tr class="odd">
<td>Chargebacks</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>1</td>
</tr>
<tr class="odd">
<td>Minimum Billing</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>2</td>
</tr>
<tr class="even">
<td rowspan="4">
<a href="/accounting/billing/showInvoice.html?invoiceNumber=BA7123421&jlbz=lfISHfhqWHPj5fSzCwFKoP8c5ukwXecQt0fr4iL6ak">
BA7123421
<input type="hidden" value="BA7123421" name="invoiceChecked"/>
</a>
</td>
<td>Payments</td>
<td>ALEXAUTOMATION01</td>
<td>ALEXADCODE</td>
<td>1</td>
</tr>
<tr class="even">
<tr class="even">
<tr class="even">
<tr class="odd">
<td rowspan="8">
<a href="/accounting/billing/showInvoice.html?invoiceNumber=BA7123398&jlbz=lfISHfhqWHPj5fSzCwFKoP8c5ukwXecQt0fr4iL6ak">
BA7123398
<input type="hidden" value="BA7123398" name="invoiceChecked"/>
</a>
</td>
<td>Management fee (captured transactions)</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>1</td>
</tr>
<tr class="odd">
<tr class="odd">
<tr class="odd">
<tr class="odd">
<tr class="odd">
<tr class="odd">
<tr class="odd">
<tr class="even">
<td rowspan="10">
<a href="/accounting/billing/showInvoice.html?invoiceNumber=BA7123397&jlbz=lfISHfhqWHPj5fSzCwFKoP8c5ukwXecQt0fr4iL6ak">
BA7123397
<input type="hidden" value="BA7123397" name="invoiceChecked"/>
</a>
</td>
<td>Management fee (captured transactions)</td>
<td>PAYPALC001M2</td>
<td>PAYPALC001A1</td>
<td>2</td>
</tr>
<tr class="even">
<tr class="even">
<tr class="even">
<tr class="even">
<tr class="even">
<tr class="even">
<tr class="even">
<tr class="even">
<tr class="even">
</tbody>

You need to add some logic to achieve this scenario.since,invoice number row is missing for some of the common invoice number. Please try with the below
Algorithm:
1. Firstly,find all the row elements from the table
2. Iterate all the rowelement and match the expected Invoice Number.
3. If the Invoice Number is matched, then print all the sub sequence column charge type until next charge invoice number matches
Code:
String InvoiceNumber="";
List<String> chargetype=new ArrayList<>();
Boolean isInvoiceSpecificCharge=false;
//Find All the tr specific element
List<WebElement> elementList=driver.findElements(By.xpath("//table/tbody/tr"));
for(WebElement element:elementList){
WebElement tempElement=null;
try{
tempElement=element.findElement(By.xpath(".//a"));
}
catch(Exception e){
}
//If the Invoice Number is present, then we need to take the charge from td[2] else from td[1].
if(tempElement.getText().equalsIgnoreCase(InvoiceNumber)){
isInvoiceSpecificCharge=true;
chargetype.add(element.findElement(By.xpath(".//td[2]")).getText());
}
else if(tempElement==null && isInvoiceSpecificCharge ==true){
chargetype.add(element.findElement(By.xpath(".//td")).getText());
}else if(!tempElement.getText().equalsIgnoreCase(InvoiceNumber)){
isInvoiceSpecificCharge=false;
}
}

JSoup Returning IndexOutOfBoundsException when fetching data from Document

Im having a really difficult time resolving the error i'm getting! To cut a story short, I am trying to get a specific element from a table in HTML! Easy right? Well that's what I thought.. Essentially, if I copy the exact HTML page source from the browser and read it in from a file, I can find the element that I need.
However, when reading through the document through document.connect("URL"), I'm getting the error! I've been sat here for about 4 hours now, reading around trying to understand what's going on. I'm fairly confident with JSoup but this has stumped me! The code is below:
private String parseKcal( Element kCalElement ) throws IOException {
//Getting error on below line
String calories = kCalElement.select(".tableWrapper").select("tr").select(".tableRow0").select("td").get(0).text();
if (calories == null) {
throw new IOException();
}
return calories.toString();
}
The parameter kCalElement is the document im trying to get the element from!
**** The error ****
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at models.Product.parseKcal(Product.java:67)
**** The HTML which i'm trying to parse ****
<div class="tableWrapper">
<table class="nutritionTable">
<thead>
<tr class="tableTitleRow">
<th scope="col">Typical Values</th><th scope="col">Per 100g </th><th scope="col">% based on RI for Average Adult</th>
</tr>
</thead>
<tr class="tableRow1">
<th scope="row" class="rowHeader" rowspan="2">Energy</th><td class="tableRow1">140kJ</td><td class="tableRow1">-</td>
</tr>
<tr class="tableRow0">
<td class="nutritionLevel1">33kcal</td><td class="nutritionLevel1">2%</td>
</tr>
<tr class="tableRow1">
<th scope="row" class="rowHeader">Fat</th><td class="nutritionLevel1"><0.5g</td><td class="nutritionLevel1">-</td>
</tr>
<tr class="tableRow0">
<th scope="row" class="rowHeader">Saturates</th><td class="nutritionLevel1"><0.1g</td><td class="nutritionLevel1">-</td>
</tr>
<tr class="tableRow1">
<th scope="row" class="rowHeader">Carbohydrate</th><td class="tableRow1">6.1g</td><td class="tableRow1">2%</td>
</tr>
<tr class="tableRow0">
<th scope="row" class="rowHeader">Total Sugars</th><td class="nutritionLevel2">6.1g</td><td class="nutritionLevel2">7%</td>
</tr>
<tr class="tableRow1">
<th scope="row" class="rowHeader">Fibre</th><td class="tableRow1">1.0g</td><td class="tableRow1">-</td>
</tr>
<tr class="tableRow0">
<th scope="row" class="rowHeader">Protein</th><td class="tableRow0">0.6g</td><td class="tableRow0">1%</td>
</tr>
<tr class="tableRow1">
<th scope="row" class="rowHeader">Salt</th><td class="nutritionLevel1"><0.01g</td><td class="nutritionLevel1">-</td>
</tr>
</table>
</div>
<p>RI= Reference Intakes of an average adult (8400kJ / 2000kcal)</p>
</div>
</div>
This does not work however when I paste the html into a string, it works!
See below:
File input = new File("~/Desktop/file.html");
Document doc = Jsoup.parse(input, "UTF-8", "");
Document document = Jsoup.parse(doc.toString());
String calories = document.select(".tableWrapper").select("tr").select(".tableRow0").select("td").get(0).text();
System.out.println(calories);
Can please someone help me from pulling my hair out! I am STUMPED :(
EDIT
I am trying to get the kcal element that contains calories!!!

java find table using jsoup and equivalent xpath

Here is the HTML code:
<table class="textfont" cellspacing="0" cellpadding="0" width="100%" align="center" border="0">
<tbody>
<tr>
<td class="chl" width="20%">Batch ID</td><td class="ctext">d32654464bdb424396f6a91f2af29ecf</td>
</tr>
<tr>
<td class="chl" width="20%">ALM Server</td>
<td class="ctext"></td>
</tr>
<tr>
<td class="chl" width="20%">ALM Domain/Project</td>
<td class="ctext">EBUSINESS/STERLING</td>
</tr>
<tr>
<td class="chl" width="20%">TestSet URL</td>
<td class="ctext">almtestset://localhost</td>
</tr>
<tr>
<td class="chl" width="20%">Tests Executed</td>
<td class="ctext"><b>6</b></td>
</tr>
<tr>
<td class="chl" width="20%">Start Time</td>
<td class="ctext">08/31/2017 12:20:46 PM</td>
</tr>
<tr>
<td class="chl" width="20%">Finish Time</td>
<td class="ctext">08/31/2017 02:31:46 PM</td>
</tr>
<tr>
<td class="chl" width="20%">Total Duration</td>
<td class="ctext"><b>2h 11m </b></td>
</tr>
<tr>
<td class="chl" width="20%">Test Parameters</td>
<td class="ctext"><b>{"browser":"chrome","browser-version":"56","language":"english","country":"US"}</b></td>
</tr>
<tr>
<td class="chl" width="20%">Passed</td>
<td class="ctext" style="color:#269900"><b>0</b></td>
</tr>
<tr>
<td class="chl" width="20%">Failed</td>
<td class="ctext" style="color:#990000"><b>6</b></td>
</tr>
<tr>
<td class="chl" width="20%">Not Completed</td>
<td class="ctext" style="color: ##ff8000;"><b>0</b></td>
</tr>
<tr>
<td class="chl" width="20%">Test Pass %</td>
<td class="ctext" style="color:#990000;font-size:14px"><b>0.0%</b></td>
</tr>
</tbody>
And here is the xpath to get the table:
//td[text() = 'TestSet URL']/ancestor::table[1]
How can I get this table using jSoup? I've tried:
tableElements = doc.select("td:contains('TestSet URL')");
to get the child element, but that doesn't work and returns null. I need to find the table and put all the children into a map. Any help would be greatly appreciated!

The following code will parse your table into a map, this code is subject to a few assumptions:
This xpath //td[text() = 'TestSet URL']/ancestor::table[1] will find any table which contains the text "TestSet URL" anywhere in its body, this seems a little bit brittle but assuming it is sufficient for you the JSoup code in getTable() is functionally equiavalent to that xpath
The code below assumes that every row contains two cells with the first one being the key and the second one being the value, since you want to parse the table content to a map this assumption seems valid
The code below throws exceptions if the above assumptions are not met i.e. if the given HTML does not contain a table definition with "TestSet URL" embedded in its body or if there are more than two cells in any row within that table.
If those assumptions are invalid then the internals of getTable and parseTable will change but the general approach will remain valid.
public void parseTable() {
Document doc = Jsoup.parse(html);
// declare a holder to contain the 'mapped rows', this is a map based on the assumption that every row represents a discreet key:value pair
Map<String, String> asMap = new HashMap<>();
Element table = getTable(doc);
// now walk though the rows creating a map for each one
Elements rows = table.select("tr");
for (int i = 0; i < rows.size(); i++) {
Element row = rows.get(i);
Elements cols = row.select("td");
// expecting this table to consist of key:value pairs where the first cell is the key and the second cell is the value
if (cols.size() == 2) {
asMap.put(cols.get(0).text(), cols.get(1).text());
} else {
throw new RuntimeException(String.format("Cannot parse the table row: %s to a key:value pair because it contains %s cells!", row.text(), cols.size()));
}
}
System.out.println(asMap);
}
private Element getTable(Document doc) {
Elements tables = doc.select("table");
for (int i = 0; i < tables.size(); i++) {
// this xpath //td[text() = 'TestSet URL']/ancestor::table[1] will find the first table which contains the
// text "TestSet URL" anywhere in its body
// this crude evaluation is the JSoup equivalent of that xpath
if (tables.get(i).text().contains("TestSet URL")) {
return tables.get(i);
}
}
throw new RuntimeException("Cannot find a table element which contains 'TestSet URL'!");
}
For the HTML posted in your question, the above code will output:
{Finish Time=08/31/2017 02:31:46 PM, Passed=0, Test Parameters={"browser":"chrome","browser-version":"56","language":"english","country":"US"}, TestSet URL=almtestset://localhost, Failed=6, Test Pass %=0.0%, Not Completed=0, Start Time=08/31/2017 12:20:46 PM, Total Duration=2h 11m, Tests Executed=6, ALM Domain/Project=EBUSINESS/STERLING, Batch ID=d32654464bdb424396f6a91f2af29ecf, ALM Server=}

You have to remove those quotation marks to get the row with the text; just
tableElements = doc.select("td:contains(TestSet URL)");
but note with the above you are only selecting td elements which contain the text "TestSet URL". To select the whole table use
Element table = doc.select("table.textfont").first();
which means select table with class=textfont and to avoid selecting multiple tables which can have the same class value you have to specify which to choose, therefore: first().
To get all the tr elements:
Elements tableRows = doc.select("table.textfont tr");
for(Element e: tableRows)
System.out.println(e);

How to use xpath to get href value

<div id="AdvancedSearchResultsContainter">
<table id="SearchResults" class="tablesorter">
<thead>
<tr>
<th scope="col" class="header">School name</th>
<th scope="col" class="header">School type</th>
<th scope="col" class="header">Sector</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>ABC Public School</td>
<td class="nowrap">Primary</td>
<td class="nowrap">Government</td>
</tr>
<tr class="even">
<td>XYZ High School</td>
<td class="nowrap">Secondary</td>
<td class="nowrap">Government</td>
</tr>
<tr class="odd">
<td>PQR Park Public School</td>
<td class="nowrap">Primary</td>
<td class="nowrap">Government</td>
</tr>
<tr class="even">
<td>JKL Public School</td>
<td class="nowrap">Primary</td>
<td class="nowrap">Government</td>
</tr>
</tbody>
</table>
</div>
I am using selenum and xpath .
I want to get the numeric value of href .Out of this href
i want to get 82648. like to put it in loop and get all numeric in href.
can some one please help.

You can use following css selector to get the <a> element:
By.cssSelector("#SearchResults tr a");
Then get all the link elements by using driver.findElements(By.cssSelector("#SearchResults tr a")) and the use getAttribute("href") to get the urls
Something like:
List<WebElement> elements = driver.findElements(By.cssSelector("#SearchResults tr a"));
Get the urls and then do whatever you want. The java.lang.String class provides a lot of methods to work on string. By the help of these methods, we can perform operations on string such as trimming, concatenating, converting, comparing, replacing strings etc. As an example:
for(WebElement e : elements) {
String url = e.getAttribute("href");
System.out.println(url.substring(url.length()-5));
}
There are other methods to get the substring as well.
Also you can write a method which will return a String and then you can assert if you intend to do so.

Skip table if not found using selenium

I have html code that is very similar to this:
<TH CLASS="ddtitle">MovieOne</TH>
<TABLE CLASS="datadisplaytable" ><CAPTION class="captiontext">Movies</CAPTION>
<TR>
<TH CLASS="ddheader" scope="col" >Genre</TH>
<TH CLASS="ddheader" scope="col" >Time</TH>
<TH CLASS="ddheader" scope="col" >Days</TH>
<TH CLASS="ddheader" scope="col" >Where</TH>
<TH CLASS="ddheader" scope="col" >Date Range</TH>
<TH CLASS="ddheader" scope="col" >Seating</TH>
<TH CLASS="ddheader" scope="col" >Actors</TH>
</TR>
<TR>
<TD CLASS="dddefault">Action</TD>
<TD CLASS="dddefault">10:00 am - 12:00 pm</TD>
<TD CLASS="dddefault">SMTWTHFSA</TD>
<TD CLASS="dddefault">AMC Showplace</TD>
<TD CLASS="dddefault">Aug 20, 2014 - Sept 12, 2014</TD>
<TD CLASS="dddefault">Reservations</TD>
<TD CLASS="dddefault">Will Ferrel (<ABBR title= "Primary">P</ABBR>) target="Will Ferrel" ></TD>
</TR>
</TABLE>
<TH CLASS="ddtitle">MovieTwo</TH>
<TABLE CLASS="datadisplaytable" ><CAPTION class="captiontext">Movies</CAPTION>
<TR>
<TH CLASS="ddheader" scope="col" >Genre</TH>
<TH CLASS="ddheader" scope="col" >Time</TH>
<TH CLASS="ddheader" scope="col" >Days</TH>
<TH CLASS="ddheader" scope="col" >Where</TH>
<TH CLASS="ddheader" scope="col" >Date Range</TH>
<TH CLASS="ddheader" scope="col" >Seating</TH>
<TH CLASS="ddheader" scope="col" >Actors</TH>
</TR>
<TR>
<TD CLASS="dddefault">Action</TD>
<TD CLASS="dddefault">11:00 am - 12:30 pm</TD>
<TD CLASS="dddefault">SMTWTHFSA</TD>
<TD CLASS="dddefault">Showplace Cinemas</TD>
<TD CLASS="dddefault">Aug 20, 2014 - Sept 12, 2014</TD>
<TD CLASS="dddefault">TBA</TD>
<TD CLASS="dddefault">Zach Galifinakis (<ABBR title= "Primary">P</ABBR>) target="Zach Galifinakis" ></TD>
</TR>
</TABLE>
<TH CLASS="ddtitle">MovieThree</TH>
<BR>
<BR>
Coming Soon
<BR>
What I want to be able to do, is take the individual table data that is relevant for the movie title, and if a Movie doesn't have a table I want to say the values are TBA. So far, I am able to get the relevant table information, but I am unable to skip a table. For example I use this code to get the genre of the movie:
int tcounter = 1;
for (Element elements : li) {
WebElement genre = driver.findElement(By.xpath("//table[#class='datadisplaytable']/descendant::table["+tcounter+"]//td[1]"));
WebElement time = driver.findElement(By.xpath("//table[#class='datadisplaytable']/descendant::table["+tcounter+"]//td[2]"));
WebElement days = driver.findElement(By.xpath("//table[#class='datadisplaytable']/descendant::table["+tcounter+"]//td[3]"));
WebElement where = driver.findElement(By.xpath("//table[#class='datadisplaytable']/descendant::table["+tcounter+"]//td[4]"));
WebElement date_range = driver.findElement(By.xpath("//table[#class='datadisplaytable']/descendant::table["+tcounter+"]//td[5]"));
WebElement seating = driver.findElement(By.xpath("//table[#class='datadisplaytable']/descendant::table["+tcounter+"]//td[6]"));
WebElement actors = driver.findElement(By.xpath("//table[#class='datadisplaytable']/descendant::table["+tcounter+"]//td[7]"));
tcounter++;
}
elements refers to a list storing all links on the webpage
(result for [1] would be action, [2] would be 10:00 am - 12:00pm ...).
This is within a for loop that increments the value of the tcounter by 1 in order to receive the data for different tables. Is there a way I can be able to tell the program to see if a table is present under the TH class, and if not give the values TBA and skip it?
This is my second attempt based on siking's answer:
List<WebElement> linstings = driver.findElements(By.className("ddtitle"));
String genre = "";
String time = "";
String days = "";
String where = "";
String dateRange = "";
String seating = "";
String actors = "";
for(WebElement potentialMovie : linstings) {
try {
WebElement actualMovie = potentialMovie.findElement(By.xpath("//table[#class='datadisplaytable']"));
// System.out.println("Actual: " + actualMovie.getText());
// make all your assignments, for example:
type = actualMovie.findElement(By.xpath("/descendant::table//td")).getText();
time = actualMovie.findElement(By.xpath("/descendant::table//td[2]")).getText();
days = actualMovie.findElement(By.xpath("/descendant::table//td[3]")).getText();
location = actualMovie.findElement(By.xpath("/descendant::table//td[4]")).getText();
dates = actualMovie.findElement(By.xpath("/descendant::table//td[5]")).getText();
schedType = actualMovie.findElement(By.xpath("/descendant::table//td[6]")).getText();
instructor = actualMovie.findElement(By.xpath("/descendant::table//td[7]")).getText();
System.out.println(genre+" "+time+" "+days+" "+where+" "+dateRange+" "+actors);
} catch(Exception ex) {
// there is no table, so:
genre = "TBA";
}
}
The problem with this code is that it keeps returning the values for only the first table.

I trimmed down your HTML sample to the following:
<TH CLASS="ddtitle">MovieOne</TH>
<TABLE CLASS="datadisplaytable">
<CAPTION class="captiontext">Movies</CAPTION>
<TR>
<TH CLASS="ddheader" scope="col">Genre</TH>
</TR>
<TR>
<TD CLASS="dddefault">Action</TD>
</TR>
</TABLE>
<TH CLASS="ddtitle">MovieTwo</TH>
<BR/>
<BR/>
Coming Soon
<BR/>
<TH CLASS="ddtitle">MovieThree</TH>
<TABLE CLASS="datadisplaytable">
<CAPTION class="captiontext">Movies</CAPTION>
<TR>
<TH CLASS="ddheader" scope="col">Genre</TH>
</TR>
<TR>
<TD CLASS="dddefault">Action</TD>
</TR>
</TABLE>
Hopefully it is representative of all your cases!
Don't use a counter, but use the actual WebElements to iterate over:
// default all your variables to TBA, like:
String genre = "TBA";
// find all the listings on the page...
List<WebElement> linstings = driver.findElements(By.className("ddtitle"));
// ... and iterate over them
for (WebElement listing : linstings) {
// grab whatever is the _first_ element under the TH ...
WebElement potentialMovie = listing.findElement(By.xpath("following-sibling::*[1]"));
// ... check if it has a child element CAPTION
if (potentialMovie.findElement(By.xpath("caption")) != null) {
// make all your assignments, for example:
genre = potentialMovie.findElement(By.xpath("tr[2]/td[1]")).getText();
}
}
Please note that this code is untested, your mileage may vary!

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing Table with Jsoup for Android App - java

Related

how to extract specified values from a table with selenium based on a string condition

JSoup Returning IndexOutOfBoundsException when fetching data from Document

java find table using jsoup and equivalent xpath

How to use xpath to get href value

Skip table if not found using selenium

Categories

Resources